Re: Query running slower than same on Oracle

Lists: pgsql-performance
From: "Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Query running slower than same on Oracle
Date: 2003-06-25 21:33:16
Message-ID: 58D6EF5727036048BC9E08ECB711A42092ECD0@crmsmail.corp.ybusa.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

We are evaluating PostgreSQL for a typical data warehouse application. I
have 3 tables below that are part of a Star schema design. The query listed
below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
Here are the details.

I'm wondering what else can be done to tune this type of query. Is 3
minutes reasonable given the amount of data that is loaded into the 3
tables? Is there anyone else who has made comparisons between Oracle and
PostgreSQL?

----------------------------------------------------------------------------
------------------------------------------------------------------------
Oracle 9.2 is running on a windows/2000 server, 600MHz PIII, 512MB ram
Shared Pool 48MB
Buffer Cache 98MB
Large Pool 8MB
Java Pool 32MB
=========
Total SGA 186MB

----------------------------------------------------------------------------
------------------------------------------------------------------------
PostgreSQL is running on Redhat Linux 7.2, 733MHz PIII processor, 383MB ram.
shared_buffers = 12384 (96 MB)
sort_mem = 16384

----------------------------------------------------------------------------
------------------------------------------------------------------------

explain analyze
select fiscalyearquarter, description, sum(amount_quantity)
from time t, revenue r, statistic s
Where t.fiscalyear = 2002
and r.timekey = t.timekey
and r.statisticskey = s.statisticskey
group by fiscalyearquarter, description;

QUERY
PLAN
----------------------------------------------------------------------------
------------------------------------------------------------------------
Aggregate (cost=124685.74..127078.87 rows=23931 width=48) (actual
time=170682.53..189640.85 rows=8 loops=1)
-> Group (cost=124685.74..126480.59 rows=239313 width=48) (actual
time=169508.49..185478.90 rows=1082454 loops=1)
-> Sort (cost=124685.74..125284.02 rows=239313 width=48) (actual
time=169508.47..171853.03 rows=1082454 loops=1)
Sort Key: t.fiscalyearquarter, s.description
-> Hash Join (cost=6.46..94784.90 rows=239313 width=48)
(actual time=140.20..47685.46 rows=1082454 loops=1)
Hash Cond: ("outer".statisticskey =
"inner".statisticskey)
-> Hash Join (cost=5.43..90595.90 rows=239313
width=32) (actual time=139.96..39672.76 rows=1082454 loops=1)
Hash Cond: ("outer".timekey = "inner".timekey)
-> Seq Scan on revenue r (cost=0.00..68454.04
rows=3829004 width=17) (actual time=0.01..26336.95 rows=3829004 loops=1)
-> Hash (cost=5.40..5.40 rows=12 width=15)
(actual time=0.79..0.79 rows=0 loops=1)
-> Seq Scan on "time" t (cost=0.00..5.40
rows=12 width=15) (actual time=0.36..0.75 rows=12 loops=1)
Filter: (fiscalyear = 2002::numeric)
-> Hash (cost=1.02..1.02 rows=2 width=16) (actual
time=0.04..0.04 rows=0 loops=1)
-> Seq Scan on statistic s (cost=0.00..1.02
rows=2 width=16) (actual time=0.02..0.03 rows=2 loops=1)
Total runtime: 195409.79 msec

This gives you an idea of the size of each table in the query
----------------------------------------------------------------------------
------------------------------------------------------------------------

pubnet=# vacuum analyze verbose revenue;
INFO: --Relation dw.revenue--
INFO: Pages 30164: Changed 0, Empty 0; Tup 3829004: Vac 0, Keep 0, UnUsed
17.
Total CPU 1.87s/0.73u sec elapsed 9.97 sec.
INFO: Analyzing dw.revenue
VACUUM
pubnet=# vacuum analyze verbose statistic;
INFO: --Relation dw.statistic--
INFO: Pages 1: Changed 0, Empty 0; Tup 2: Vac 0, Keep 0, UnUsed 1.
Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: Analyzing dw.statistic
VACUUM
pubnet=# vacuum analyze verbose time;
INFO: --Relation dw.time--
INFO: Pages 3: Changed 0, Empty 0; Tup 192: Vac 0, Keep 0, UnUsed 33.
Total CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: Analyzing dw.time
VACUUM
pubnet=#

I tried to disable the use of hash join to see what might happen. This
causes the optimizer to use a merge join. The timings are worse.

Here is the plan for that


QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Aggregate (cost=665570.44..667963.57 rows=23931 width=48) (actual
time=362121.97..381081.18 rows=8 loops=1)
-> Group (cost=665570.44..667365.29 rows=239313 width=48) (actual
time=360948.51..376904.14 rows=1082454 loops=1)
-> Sort (cost=665570.44..666168.72 rows=239313 width=48) (actual
time=360948.48..363285.85 rows=1082454 loops=1)
Sort Key: t.fiscalyearquarter, s.description
-> Merge Join (cost=631481.61..635669.60 rows=239313
width=48) (actual time=263257.77..276625.27 rows=1082454 loops=1)
Merge Cond: ("outer".statisticskey =
"inner".statisticskey)
-> Sort (cost=631480.58..632078.86 rows=239313
width=32) (actual time=260561.38..264151.04 rows=1082454 loops=1)
Sort Key: r.statisticskey
-> Merge Join (cost=587963.25..610099.74
rows=239313 width=32) (actual time=217380.88..231958.36 rows=1082454
loops=1)
Merge Cond: ("outer".timekey =
"inner".timekey)
-> Sort (cost=5.62..5.65 rows=12
width=15) (actual time=14.90..14.92 rows=12 loops=1)
Sort Key: t.timekey
-> Seq Scan on "time" t
(cost=0.00..5.40 rows=12 width=15) (actual time=13.47..14.83 rows=12
loops=1)
Filter: (fiscalyear =
2002::numeric)
-> Sort (cost=587957.63..597530.14
rows=3829004 width=17) (actual time=214776.92..224634.94 rows=1455997
loops=1)
Sort Key: r.timekey
-> Seq Scan on revenue r
(cost=0.00..68454.04 rows=3829004 width=17) (actual time=1.33..31014.95
rows=3829004 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=16) (actual
time=2696.35..3765.93 rows=541228 loops=1)
Sort Key: s.statisticskey
-> Seq Scan on statistic s (cost=0.00..1.02
rows=2 width=16) (actual time=19.50..19.52 rows=2 loops=1)
Total runtime: 385939.85 msec

The Query plan in Oracle looks like this...
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
SORT GROUP BY
HASH JOIN
MERGE JOIN CARTESIAN
TABLE ACCESS FULL DBA_ADMIN
STATISTIC
BUFFER SORT
TABLE ACCESS FULL DBA_ADMIN
TIME
TABLE ACCESS FULL DBA_ADMIN
REVENUE


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com>
Cc: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query running slower than same on Oracle
Date: 2003-06-25 21:50:45
Message-ID: 18325.1056577845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com> writes:
> We are evaluating PostgreSQL for a typical data warehouse application. I
> have 3 tables below that are part of a Star schema design. The query listed
> below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
> Here are the details.

The majority of the runtime seems to be going into the sort step. There
is not much to be done about this in 7.3, but 7.4 should use a hashed
aggregation approach for this query, which'd eliminate the sort step and
hopefully reduce the time a great deal. Since you're only doing
evaluation at this point, it might be worth your while to try out CVS
tip ...

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com>, "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query running slower than same on Oracle
Date: 2003-06-25 21:51:33
Message-ID: 200306251451.33928.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Denis,

> I'm wondering what else can be done to tune this type of query. Is 3
> minutes reasonable given the amount of data that is loaded into the 3
> tables? Is there anyone else who has made comparisons between Oracle and
> PostgreSQL?

We will probably be a bit slower on aggregates than Oracle is, for reasons
discussed on this list ad nauseum. However, it also looks from the queries
like you forgot to index your foriegn keys.

--
-Josh Berkus
Aglio Database Solutions
San Francisco