Re: Proposed Query Planner TODO items

Lists: pgsql-hackers
From: markw(at)osdl(dot)org
To: tgl(at)sss(dot)pgh(dot)pa(dot)us, josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 17:04:54
Message-ID: 200402091704.i19H4wE24490@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6 Feb, To: tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> On 5 Jan, Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> 2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES
>>
>>> Summary: Currently, queries with complex "or group" criteria get devolved by
>>> the planner into canonical and-or filters resulting in very poor execution on
>>> large data sets. We should find better ways of dealing with these queries,
>>> for example UNIONing.
>>
>>> Description: While helping OSDL with their derivative TPC-R benchmark, we ran
>>> into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: markw(at)osdl(dot)org
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 17:23:22
Message-ID: 21196.1076347402@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

markw(at)osdl(dot)org writes:
> http://developer.osdl.org/markw/dbt3-pgsql/

> There's a short summary of the tests I ran over the weekend, with links
> to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
> looks like query #7 had the only significant improvement. Oprofile data
> should be there too, if that'll help. Let us know if there's anything
> else we can try for you.

I couldn't figure out anything at all from that, possibly because many
of the links are dead, eg the "task" descriptions. I don't even see
where you see the time for query #7.

What would be interesting from my perspective is "explain" results (or
even better, "explain analyze" results) for the problem queries. Any
chance of extracting such a thing?

regards, tom lane


From: markw(at)osdl(dot)org
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 17:30:04
Message-ID: 200402091730.i19HUHE28069@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 Feb, Tom Lane wrote:
> markw(at)osdl(dot)org writes:
>> http://developer.osdl.org/markw/dbt3-pgsql/
>
>> There's a short summary of the tests I ran over the weekend, with links
>> to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
>> looks like query #7 had the only significant improvement. Oprofile data
>> should be there too, if that'll help. Let us know if there's anything
>> else we can try for you.
>
> I couldn't figure out anything at all from that, possibly because many
> of the links are dead, eg the "task" descriptions. I don't even see
> where you see the time for query #7.
>
> What would be interesting from my perspective is "explain" results (or
> even better, "explain analyze" results) for the problem queries. Any
> chance of extracting such a thing?

Sorry about the task links, I think I've got that corrected.

I'll see what I can do about the "explain" and "explain analyze"
results. I remember in the past that someone said it would be most
interesting to execute the latter while the test while running, as
opposed to before or after a test. Should I do that here too?

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: markw(at)osdl(dot)org
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 17:33:00
Message-ID: 21284.1076347980@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

markw(at)osdl(dot)org writes:
> I'll see what I can do about the "explain" and "explain analyze"
> results. I remember in the past that someone said it would be most
> interesting to execute the latter while the test while running, as
> opposed to before or after a test. Should I do that here too?

If possible, but I'd settle for a standalone result, so long as it's
executed against the correct database contents (including pg_statistic
settings).

regards, tom lane


From: markw(at)osdl(dot)org
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 17:48:39
Message-ID: 200402091748.i19HmwE31410@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 Feb, Tom Lane wrote:
> markw(at)osdl(dot)org writes:
>> I'll see what I can do about the "explain" and "explain analyze"
>> results. I remember in the past that someone said it would be most
>> interesting to execute the latter while the test while running, as
>> opposed to before or after a test. Should I do that here too?
>
> If possible, but I'd settle for a standalone result, so long as it's
> executed against the correct database contents (including pg_statistic
> settings).

Ok, I've found that the kit does capture "explain" results and I've
added a "Query Plans" links under the query time charts on each of the
pages. Um, but I did notice a couple of problems. It looks liks one of
the 22 queries is missing and they're not labeled. I'll see about
getting that fixed.

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: markw(at)osdl(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 18:42:08
Message-ID: 200402091042.08574.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

> Ok, I've found that the kit does capture "explain" results and I've
> added a "Query Plans" links under the query time charts on each of the
> pages. Um, but I did notice a couple of problems. It looks liks one of
> the 22 queries is missing and they're not labeled. I'll see about
> getting that fixed.

If #19 is missing it's because Oleg & I could not get it to complete. That
was also the query which we are most interested in testing.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: markw(at)osdl(dot)org, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 19:08:37
Message-ID: 22262.1076353717@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> If #19 is missing it's because Oleg & I could not get it to complete. That
> was also the query which we are most interested in testing.

Q19 doesn't seem to be particularly slow in either the 7.4 or 7.5 tests
--- there are many others with longer runtimes. I speculate that what
is actually being run here is a modified Q19 query with the merge join
condition pulled out by hand. The CVS-tip planner should be able to do
that for itself, though, and obtain essentially this same performance
with the per-spec query.

regards, tom lane


From: markw(at)osdl(dot)org
To: josh(at)agliodbs(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-09 19:39:41
Message-ID: 200402091939.i19JdiE14359@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 Feb, Josh Berkus wrote:
> Mark,
>
>> Ok, I've found that the kit does capture "explain" results and I've
>> added a "Query Plans" links under the query time charts on each of the
>> pages. Um, but I did notice a couple of problems. It looks liks one of
>> the 22 queries is missing and they're not labeled. I'll see about
>> getting that fixed.
>
> If #19 is missing it's because Oleg & I could not get it to complete. That
> was also the query which we are most interested in testing.

Oh, it's probably because we've altered Q19 and Q20. I'm still not all
that familiar with this kit, so I'm learning as we go. So we need to
change it back to make it worthwhile for you.

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jenny Zhang <jenny(at)osdl(dot)org>, markw(at)osdl(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-10 00:53:14
Message-ID: 200402091653.14528.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jenny,

> For 19, we moved the common conditions out of the big ORs, for 20, we
> added distinct. We can change the query back if the optimizer can
> handle it now.

Well, we want to test if it can.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Jenny Zhang <jenny(at)osdl(dot)org>
To: markw(at)osdl(dot)org
Cc: josh(at)agliodbs(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-10 00:55:50
Message-ID: 1076374550.14046.47.camel@ibm-a.pdx.osdl.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2004-02-09 at 11:39, markw(at)osdl(dot)org wrote:
> On 9 Feb, Josh Berkus wrote:
> > Mark,
> >
> >> Ok, I've found that the kit does capture "explain" results and I've
> >> added a "Query Plans" links under the query time charts on each of the
> >> pages. Um, but I did notice a couple of problems. It looks liks one of
> >> the 22 queries is missing and they're not labeled. I'll see about
> >> getting that fixed.
> >
> > If #19 is missing it's because Oleg & I could not get it to complete. That
> > was also the query which we are most interested in testing.
>
> Oh, it's probably because we've altered Q19 and Q20. I'm still not all
> that familiar with this kit, so I'm learning as we go. So we need to
> change it back to make it worthwhile for you.
>
The missing one is query 15, since it has create view, and I can not get
execution plan for that query.

I started the kit on PG 7.3. We can not finish query 19 and 20 at that
time. So I rewrote 19 and 20:
For 19, we moved the common conditions out of the big ORs, for 20, we
added distinct. We can change the query back if the optimizer can
handle it now.

Jenny


From: Jenny Zhang <jenny(at)osdl(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: markw(at)osdl(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-10 01:10:25
Message-ID: 1076375425.14046.56.camel@ibm-a.pdx.osdl.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2004-02-09 at 16:53, Josh Berkus wrote:
> Jenny,
>
> > For 19, we moved the common conditions out of the big ORs, for 20, we
> > added distinct. We can change the query back if the optimizer can
> > handle it now.
>
> Well, we want to test if it can.
Replace the file 19.sql under datagen/pgsql-queries with the attachment
should do it.

Jenny

Attachment Content-Type Size
19-h.sql text/x-sql 1.0 KB

From: markw(at)osdl(dot)org
To: tgl(at)sss(dot)pgh(dot)pa(dot)us, josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 17:17:45
Message-ID: 200402121717.i1CHHoE19735@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok, I have EXPLAIN ANALYZE results for both the power and throughput
tests:
http://developer.osdl.org/markw/dbt3-pgsql/

It's run #60 and the links are towards the bottom of the page under the
"Run log data" heading. The results from the power test is
"power_query.result" and "thuput_qs1.result", etc. for each stream in
the throughput test.

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: markw(at)osdl(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 17:29:38
Message-ID: 200402120929.38705.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

> It's run #60 and the links are towards the bottom of the page under the
> "Run log data" heading. The results from the power test is
> "power_query.result" and "thuput_qs1.result", etc. for each stream in
> the throughput test.

I'm confused. Were you able to get the original-form query #19 to complete,
or not?

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: markw(at)osdl(dot)org
To: josh(at)agliodbs(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 17:32:35
Message-ID: 200402121732.i1CHWcE22082@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 Feb, Josh Berkus wrote:
> Mark,
>
>> It's run #60 and the links are towards the bottom of the page under the
>> "Run log data" heading. The results from the power test is
>> "power_query.result" and "thuput_qs1.result", etc. for each stream in
>> the throughput test.
>
> I'm confused. Were you able to get the original-form query #19 to complete,
> or not?

Oh sorry, I completely forgot that Q19 the whole purpose of this. So
#60 doesn't have the right Q19. I'll run with the one you want now.

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: markw(at)osdl(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 17:35:15
Message-ID: 200402120935.15275.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

> Oh sorry, I completely forgot that Q19 the whole purpose of this. So
> #60 doesn't have the right Q19. I'll run with the one you want now.

Thanks! And the original, not the "fixed", Q19 if you please. It's the
original that wouldn't finish on Postgres 7.3.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: markw(at)osdl(dot)org
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 18:50:24
Message-ID: 13571.1076611824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

markw(at)osdl(dot)org writes:
> Ok, I have EXPLAIN ANALYZE results for both the power and throughput
> tests:
> http://developer.osdl.org/markw/dbt3-pgsql/

Thanks. I just looked at Q9 and Q21, since those are the slowest
queries according to your chart. (Are all the queries weighted the same
for evaluation purposes, or are some more important than others?)

The problem with Q9 seems to be an estimation failure:

-> Nested Loop (cost=0.00..437591.67 rows=92 width=74) (actual time=12.030..1603892.783 rows=681518 loops=1)
-> Nested Loop (cost=0.00..65364.57 rows=61720 width=43) (actual time=0.326..5667.573 rows=90676 loops=1)
-> Seq Scan on part (cost=0.00..15733.27 rows=15992 width=11) (actual time=0.183..1539.306 rows=22669 loops=1)
Filter: ((p_name)::text ~~ '%hot%'::text)
-> Index Scan using i_ps_partkey on partsupp (cost=0.00..3.05 rows=4 width=32) (actual time=0.119..0.151 rows=4 loops=22669)
Index Cond: ("outer".p_partkey = partsupp.ps_partkey)
-> Index Scan using i_l_suppkey_partkey on lineitem (cost=0.00..6.02 rows=1 width=64) (actual time=2.183..17.564 rows=8 loops=90676)
Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND ("outer".ps_suppkey = lineitem.l_suppkey))

The estimate for the part/partsupp join is close enough (60K vs 90K
rows), but why is it estimating 92 rows out of the join to lineitem when
the true figure is 681518? With a more accurate estimate the planner
would probably have chosen different join methods above this point.

Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
ps_suppkey, and l_suppkey?

It would also be interesting to see whether a better estimate emerges
if you increase default_statistics_target (try 100 or so).

Q21 is a more interesting case:

EXPLAIN ANALYZE
select s_name, count(*) as numwait
from supplier, lineitem l1, orders, nation
where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate
and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey )
and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate )
and s_nationkey = n_nationkey and n_name = 'MOROCCO'
group by s_name
order by numwait desc, s_name
LIMIT 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2984783.51..2984783.76 rows=100 width=29) (actual time=1490860.249..1490860.460 rows=100 loops=1)
-> Sort (cost=2984783.51..2984831.91 rows=19361 width=29) (actual time=1490860.244..1490860.320 rows=100 loops=1)
Sort Key: count(*), supplier.s_name
-> HashAggregate (cost=2983356.52..2983404.92 rows=19361 width=29) (actual time=1490853.802..1490856.472 rows=760 loops=1)
-> Nested Loop (cost=0.00..2983259.72 rows=19361 width=29) (actual time=350.991..1490777.523 rows=7471 loops=1)
-> Nested Loop (cost=0.00..2862119.72 rows=40000 width=40) (actual time=350.805..1453771.752 rows=15369 loops=1)
-> Nested Loop (cost=0.00..994.08 rows=802 width=40) (actual time=0.152..187.510 rows=760 loops=1)
Join Filter: ("inner".s_nationkey = "outer".n_nationkey)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=9) (actual time=0.088..0.113 rows=1 loops=1)
Filter: (n_name = 'MOROCCO'::bpchar)
-> Seq Scan on supplier (cost=0.00..742.34 rows=20034 width=49) (actual time=0.010..136.902 rows=20000 loops=1)
-> Index Scan using i_l_suppkey on lineitem l1 (cost=0.00..3566.81 rows=54 width=21) (actual time=87.928..1912.454 rows=20 loops=760)
Index Cond: ("outer".s_suppkey = l1.l_suppkey)
Filter: ((l_receiptdate > l_commitdate) AND (subplan) AND (NOT (subplan)))
SubPlan
-> Index Scan using i_l_orderkey on lineitem l3 (cost=0.00..3.13 rows=3 width=178) (actual time=0.066..0.066 rows=1 loops=277343)
Index Cond: (l_orderkey = $0)
Filter: ((l_suppkey <> $1) AND (l_receiptdate > l_commitdate))
-> Index Scan using i_l_orderkey on lineitem l2 (cost=0.00..3.11 rows=7 width=178) (actual time=0.812..0.812 rows=1 loops=287821)
Index Cond: (l_orderkey = $0)
Filter: (l_suppkey <> $1)
-> Index Scan using orders_pkey on orders (cost=0.00..3.02 rows=1 width=11) (actual time=2.397..2.399 rows=0 loops=15369)
Index Cond: (orders.o_orderkey = "outer".l_orderkey)
Filter: (o_orderstatus = 'F'::bpchar)
Total runtime: 1490867.126 ms
(25 rows)

I think the key issue here is that the two EXISTS tests depend only on
l1.l_orderkey and l1.l_suppkey of the outer query. Therefore they get
"pushed down" in the plan tree to be evaluated during the initial scan
of l1. This is normally a good heuristic choice, but because the EXISTS
tests are relatively expensive, that ends up forcing the planner to use
a nestloop-with-inner-index-scan join between nation/supplier and l1.
Any other join technique will involve a seqscan of l1 causing the EXISTS
tests to be evaluated at every row of lineitem; the planner correctly
ranks those alternatives as even worse than this.

The trouble is that the nestloop is hugely expensive: you can see that
the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
0.812*287821 or 1201449.750 msec, about 80% of the total.

It seems that the correct way to plan this query would require
postponing evaluation of the EXISTS clauses. If those were further up
the tree, the planner would have chosen a merge or hash join at this
step, which would probably take a tenth as much time. The cost to run
the EXISTS clauses themselves wouldn't change; they'd not be executed
any more frequently in this case.

I recall seeing traces in the code of logic that would attempt to delay
the evaluation of expensive WHERE tests, but that's been gone since
Berkeley days. Perhaps we should think about resurrecting it, or at
least putting in some kind of heuristic to try to cope better with this
case.

It would be interesting to see what the runtime looks like if you add
the following to the WHERE clauses of both inner EXISTS:
AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
This would not change the results AFAICS, but it would force the
evaluation of the EXISTS clauses up to the top level of the outer plan
(since the planner would then see 'em as join constraints).

regards, tom lane


From: markw(at)osdl(dot)org
To: josh(at)agliodbs(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-12 23:50:33
Message-ID: 200402122350.i1CNoaE06063@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 Feb, Josh Berkus wrote:
> Mark,
>
>> Oh sorry, I completely forgot that Q19 the whole purpose of this. So
>> #60 doesn't have the right Q19. I'll run with the one you want now.
>
> Thanks! And the original, not the "fixed", Q19 if you please. It's the
> original that wouldn't finish on Postgres 7.3.

Josh,

http://developer.osdl.org/markw/dbt3-pgsql/

Check out #61. I replaced the Q19 template with the one Jenny sent out.
Looks like it ran just fine. This run also has the EXPLAIN ANALYZE
results, but none of the other things Tom has asked for yet.

Mark


From: Dennis Haney <davh(at)diku(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-13 08:27:31
Message-ID: 402C8A73.20804@diku.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

You are refering to:

@inproceedings{ hellerstein93predicate,
author = "Joseph M. Hellerstein and Michael Stonebraker",
title = "Predicate migration: optimizing queries with expensive
predicates",
pages = "267--276",
year = "1993",
abstract = "The traditional focus of relational query optimization
schemes has been on the choice of join methods and join orders.
Restrictions have typically been handled in query optimizers by
"predicate pushdown" rules, which apply restrictions in some random
order before as many joins as possible. These rules work under the
assumption that restriction is essentially a zero-time operation.
However, today's extensible and object-oriented database systems allow
users to define time-consuming functions,...",
url = "citeseer.nj.nec.com/article/hellerstein92predicate.html" }

Tom Lane wrote:

>I think the key issue here is that the two EXISTS tests depend only on
>l1.l_orderkey and l1.l_suppkey of the outer query. Therefore they get
>"pushed down" in the plan tree to be evaluated during the initial scan
>of l1. This is normally a good heuristic choice, but because the EXISTS
>tests are relatively expensive, that ends up forcing the planner to use
>a nestloop-with-inner-index-scan join between nation/supplier and l1.
>Any other join technique will involve a seqscan of l1 causing the EXISTS
>tests to be evaluated at every row of lineitem; the planner correctly
>ranks those alternatives as even worse than this.
>
>The trouble is that the nestloop is hugely expensive: you can see that
>the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
>0.812*287821 or 1201449.750 msec, about 80% of the total.
>
>It seems that the correct way to plan this query would require
>postponing evaluation of the EXISTS clauses. If those were further up
>the tree, the planner would have chosen a merge or hash join at this
>step, which would probably take a tenth as much time. The cost to run
>the EXISTS clauses themselves wouldn't change; they'd not be executed
>any more frequently in this case.
>
>I recall seeing traces in the code of logic that would attempt to delay
>the evaluation of expensive WHERE tests, but that's been gone since
>Berkeley days. Perhaps we should think about resurrecting it, or at
>least putting in some kind of heuristic to try to cope better with this
>case.
>
>It would be interesting to see what the runtime looks like if you add
>the following to the WHERE clauses of both inner EXISTS:
> AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
>This would not change the results AFAICS, but it would force the
>evaluation of the EXISTS clauses up to the top level of the outer plan
>(since the planner would then see 'em as join constraints).
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>

--
Dennis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dennis Haney <davh(at)diku(dot)dk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-13 14:33:34
Message-ID: 1938.1076682814@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dennis Haney <davh(at)diku(dot)dk> writes:
> You are refering to:
> @inproceedings{ hellerstein93predicate,
> author = "Joseph M. Hellerstein and Michael Stonebraker",
> title = "Predicate migration: optimizing queries with expensive
> predicates",

Yup, I sure am. This is the same thesis referred to here:
http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php

We may need to put some of it back ;-)

regards, tom lane


From: markw(at)osdl(dot)org
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-13 16:52:05
Message-ID: 200402131652.i1DGqHE14713@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 Feb, Tom Lane wrote:
> markw(at)osdl(dot)org writes:
>> Ok, I have EXPLAIN ANALYZE results for both the power and throughput
>> tests:
>> http://developer.osdl.org/markw/dbt3-pgsql/
>
> Thanks. I just looked at Q9 and Q21, since those are the slowest
> queries according to your chart. (Are all the queries weighted the same
> for evaluation purposes, or are some more important than others?)
>
[snip]
>
> The estimate for the part/partsupp join is close enough (60K vs 90K
> rows), but why is it estimating 92 rows out of the join to lineitem when
> the true figure is 681518? With a more accurate estimate the planner
> would probably have chosen different join methods above this point.
>
> Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
> ps_suppkey, and l_suppkey?
>
> It would also be interesting to see whether a better estimate emerges
> if you increase default_statistics_target (try 100 or so).

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.) Pretty significant performance change.

Power:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.ps_suppkey.out

Throughput:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.ps_suppkey.out

Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.

Mark


From: Dennis Haney <davh(at)diku(dot)dk>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-16 16:17:50
Message-ID: 4030ED2E.5000902@diku.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

markw(at)osdl(dot)org wrote:

>On 12 Feb, Tom Lane wrote:
>
>
>http://developer.osdl.org/markw/dbt3-pgsql/62/
>
>This run changes default_statistics_target to 1000 and I have p_partkey,
>l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
>http (no links on the web page.) Pretty significant performance change.
>
>
>
Why the filesystem change to ext2 at the same time?

>Something went wrong when I tried to run another test with the Q21
>changes overnight, so I'll have to get back to you on that one.
>
>

--
Dennis


From: markw(at)osdl(dot)org
To: davh(at)diku(dot)dk
Cc: pgsql-hackers(at)postgresql(dot)org, jenny(at)osdl(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-16 16:26:53
Message-ID: 200402161626.i1GGQvE23197@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 Feb, Dennis Haney wrote:
> markw(at)osdl(dot)org wrote:
>
>>On 12 Feb, Tom Lane wrote:
>>
>>
>>http://developer.osdl.org/markw/dbt3-pgsql/62/
>>
>>This run changes default_statistics_target to 1000 and I have p_partkey,
>>l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
>>http (no links on the web page.) Pretty significant performance change.
>>
>>
>>
> Why the filesystem change to ext2 at the same time?

I've been rotating filesystems occasionally. Otherwise no specific
reason.

Mark


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-25 04:59:46
Message-ID: 20041025.135946.39154534.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
--
Tatsuo Ishii

> On 6 Feb, To: tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> > On 5 Jan, Tom Lane wrote:
> >> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >>> 2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES
> >>
> >>> Summary: Currently, queries with complex "or group" criteria get devolved by
> >>> the planner into canonical and-or filters resulting in very poor execution on
> >>> large data sets. We should find better ways of dealing with these queries,
> >>> for example UNIONing.
> >>
> >>> Description: While helping OSDL with their derivative TPC-R benchmark, we ran
> >>> into a query (#19) which took several hours to complete on PostgreSQL.
>
> http://developer.osdl.org/markw/dbt3-pgsql/
>
> There's a short summary of the tests I ran over the weekend, with links
> to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
> looks like query #7 had the only significant improvement. Oprofile data
> should be there too, if that'll help. Let us know if there's anything
> else we can try for you.
>
> Mark
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Reini Urban <rurban(at)x-ray(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-25 10:18:21
Message-ID: 417CD2ED.3080600@x-ray.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii schrieb:
> I see nice graphs for each DBT3 query(for example,
> http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
> they do not come with normal dbt3-1.4 kit. How did you get them?
> Maybe you have slightly modified dbt3 kit?

This looks like a simple ploticus one-liner.
like:
pl -png -o vbars.png -prefab vbars data=dbt3.data x=1 y=2 barwidth=line

see for example: http://ploticus.sourceforge.net/doc/prefab_vbars.html
or
http://phpwiki.sourceforge.net/phpwiki/PhpMemoryExhausted/Testresults
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/


From: Mark Wong <markw(at)osdl(dot)org>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-25 15:16:08
Message-ID: 20041025081608.A13520@osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tatsuo,

Yes, I've been updating the dbt3 kit over the past several months.
The query time graph is a new feature. It's available via BitKeeper
at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
well enough to make a v1.5 release yet. If BitKeeper isn't something
you can use, I can make a preliminary tarball for you.

Mark

On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:
> Mark,
>
> I see nice graphs for each DBT3 query(for example,
> http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
> they do not come with normal dbt3-1.4 kit. How did you get them?
> Maybe you have slightly modified dbt3 kit?
> --
> Tatsuo Ishii
>
> > On 6 Feb, To: tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> > > On 5 Jan, Tom Lane wrote:
> > >> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > >>> 2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES
> > >>
> > >>> Summary: Currently, queries with complex "or group" criteria get devolved by
> > >>> the planner into canonical and-or filters resulting in very poor execution on
> > >>> large data sets. We should find better ways of dealing with these queries,
> > >>> for example UNIONing.
> > >>
> > >>> Description: While helping OSDL with their derivative TPC-R benchmark, we ran
> > >>> into a query (#19) which took several hours to complete on PostgreSQL.
> >
> > http://developer.osdl.org/markw/dbt3-pgsql/
> >
> > There's a short summary of the tests I ran over the weekend, with links
> > to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
> > looks like query #7 had the only significant improvement. Oprofile data
> > should be there too, if that'll help. Let us know if there's anything
> > else we can try for you.
> >
> > Mark
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> >

--
Mark Wong - - markw(at)osdl(dot)org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436 (fax)
http://developer.osdl.org/markw/


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-26 03:44:49
Message-ID: 20041026.124449.13773317.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Thanks for the info. Would you give me the tarball?
--
Tatsuo Ishii

> Hi Tatsuo,
>
> Yes, I've been updating the dbt3 kit over the past several months.
> The query time graph is a new feature. It's available via BitKeeper
> at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
> well enough to make a v1.5 release yet. If BitKeeper isn't something
> you can use, I can make a preliminary tarball for you.
>
> Mark
>
> On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:
> > Mark,
> >
> > I see nice graphs for each DBT3 query(for example,
> > http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
> > they do not come with normal dbt3-1.4 kit. How did you get them?
> > Maybe you have slightly modified dbt3 kit?
> > --
> > Tatsuo Ishii
> >
> > > On 6 Feb, To: tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> > > > On 5 Jan, Tom Lane wrote:
> > > >> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > > >>> 2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES
> > > >>
> > > >>> Summary: Currently, queries with complex "or group" criteria get devolved by
> > > >>> the planner into canonical and-or filters resulting in very poor execution on
> > > >>> large data sets. We should find better ways of dealing with these queries,
> > > >>> for example UNIONing.
> > > >>
> > > >>> Description: While helping OSDL with their derivative TPC-R benchmark, we ran
> > > >>> into a query (#19) which took several hours to complete on PostgreSQL.
> > >
> > > http://developer.osdl.org/markw/dbt3-pgsql/
> > >
> > > There's a short summary of the tests I ran over the weekend, with links
> > > to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
> > > looks like query #7 had the only significant improvement. Oprofile data
> > > should be there too, if that'll help. Let us know if there's anything
> > > else we can try for you.
> > >
> > > Mark
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 2: you can get off all lists at once with the unregister command
> > > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> > >
>
> --
> Mark Wong - - markw(at)osdl(dot)org
> Open Source Development Lab Inc - A non-profit corporation
> 12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
> (503) 626-2455 x 32 (office)
> (503) 626-2436 (fax)
> http://developer.osdl.org/markw/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


From: Mark Wong <markw(at)osdl(dot)org>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-28 16:11:10
Message-ID: 20041028091109.A24826@osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:
> Hi,
>
> Thanks for the info. Would you give me the tarball?


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-10-29 00:57:12
Message-ID: 20041029.095712.35013127.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hi Tatsuo,
>
> I've made a new release:
> http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
>
> Let me know if there are any problems.

Thanks!

> On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:
> > Hi,
> >
> > Thanks for the info. Would you give me the tarball?
>


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Query Planner TODO items
Date: 2004-11-04 09:18:53
Message-ID: 20041104.181853.15267765.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Hi Tatsuo,
> >
> > I've made a new release:
> > http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
> >
> > Let me know if there are any problems.
>
> Thanks!

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?
--
Tatsuo Ishii


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: DBT-3 v1.5 Q19 (Re: Proposed Query Planner TODO items)
Date: 2004-11-04 09:45:24
Message-ID: 20041104.184524.26271633.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > > Hi Tatsuo,
> > >
> > > I've made a new release:
> > > http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
> > >
> > > Let me know if there are any problems.
> >
> > Thanks!
>
> Just for quick note, it seems query 19 takes forever. Have you
> successfully run Q19?

Here is the more detailed info. The query was not finished within 3
days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
Linux. PostgreSQL is 7.4.5 with default postgresql.conf. An explain
output is attatched.
--
Tatsuo Ishii

Attachment Content-Type Size
unknown_filename text/plain 27.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: markw(at)osdl(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: DBT-3 v1.5 Q19 (Re: Proposed Query Planner TODO items)
Date: 2004-11-04 15:08:32
Message-ID: 25372.1099580912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
>> Just for quick note, it seems query 19 takes forever. Have you
>> successfully run Q19?

> Here is the more detailed info. The query was not finished within 3
> days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
> Linux. PostgreSQL is 7.4.5 with default postgresql.conf.

7.4's planner is not able to do anything useful with the complicated
WHERE clause in Q19. I believe I've improved that situation for 8.0.
A really slick solution will probably have to await the appearance of
bitmap indexes, though.

regards, tom lane


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: markw(at)osdl(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: DBT-3 v1.5 Q19
Date: 2004-11-04 15:17:18
Message-ID: 20041105.001718.45516682.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> >> Just for quick note, it seems query 19 takes forever. Have you
> >> successfully run Q19?
>
> > Here is the more detailed info. The query was not finished within 3
> > days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
> > Linux. PostgreSQL is 7.4.5 with default postgresql.conf.
>
> 7.4's planner is not able to do anything useful with the complicated
> WHERE clause in Q19. I believe I've improved that situation for 8.0.
> A really slick solution will probably have to await the appearance of
> bitmap indexes, though.

Thanks. I will try with 8.0.
--
Tatsuo Ishii