Planner really hates nested loops

Lists: pgsql-performance
From: David Brown <time(at)bigpond(dot)net(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner really hates nested loops
Date: 2005-02-03 11:10:37
Message-ID: 420206AD.8040206@bigpond.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I'm hoping someone can shed some light on these results. The 'factor'
compares the ratios of cost to actual for different plans. Perhaps
nested loops should be given a discount in the planner? The estimates
seem to be out by one and a half orders of magnitude. :(

========== QUERY ==========

SELECT sum(L.Extended)
FROM sord H
JOIN sordln L USING (OrderNo)
[ WHERE H.OrderDate between '2003-01-01' and '2003-03-16' ]
[ WHERE H.OrderDate between '2003-01-01' and '2003-09-02' ]

========== SUMMARY ==========

Join Cost Cache Factor Disk Factor
------------------------------------------------------------
10% ROWS
Hash 40085 4.9s 1.0 12.8s 1.0
Merge 63338 4.1s 1.9 23.1s 0.9
Hash Idx 65386 5.5s 1.5 30.7s 0.7
Nest 257108 0.8s 39.3 2.7s 30.4
33% ROWS
Hash 43646 5.8s 1.0 13.6s 1.0
Merge 67153 6.0s 1.5 30.7s 0.7
Hash Idx 68946 6.5s 1.4
Nest 868642 2.8s 41.2 10.2s 26.5
ALL ROWS
Hash 53458 8.9s 1.0 14.3s 1.0
Merge 76156 9.4s 1.3 35.2s 0.6
Nest 2594934 9.2s 47.0 33.8s 20.5

========== 10% CACHE ROWS ==========

QUERY PLAN (Hash Join on <10% cache rows, indexed + sequential)
Aggregate (cost=40085.14..40085.14 rows=1 width=8) (actual
time=4907.000..4907.000 rows=1 loops=1)
-> Hash Join (cost=145.11..39814.32 rows=108324 width=8) (actual
time=3844.000..4735.000 rows=96183 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=0.000..2313.000 rows=1093398 loops=1)
-> Hash (cost=138.48..138.48 rows=2655 width=4) (actual
time=16.000..16.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=0.000..0.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 4907.000 ms

QUERY PLAN (Merge Join on <10% cache rows, indexed only)
Aggregate (cost=63338.43..63338.43 rows=1 width=8) (actual
time=4141.000..4141.000 rows=1 loops=1)
-> Merge Join (cost=289.47..63067.62 rows=108324 width=8) (actual
time=3000.000..3896.000 rows=96183 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual time=0.000..2058.000
rows=737827 loops=1)
-> Sort (cost=289.47..296.11 rows=2655 width=4) (actual
time=16.000..127.000 rows=96174 loops=1)
Sort Key: h.orderno
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=0.000..0.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 4141.000 ms

QUERY PLAN (Hash Join on <10% cache rows, indexed only)
Aggregate (cost=65385.95..65385.95 rows=1 width=8) (actual
time=5516.000..5516.000 rows=1 loops=1)
-> Hash Join (cost=145.11..65115.13 rows=108324 width=8) (actual
time=3031.000..5376.000 rows=96183 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual time=0.000..3091.000
rows=1093398 loops=1)
-> Hash (cost=138.48..138.48 rows=2655 width=4) (actual
time=0.000..0.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=0.000..0.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 5516.000 ms

QUERY PLAN (Nested Loop on <10% cache rows, indexed only)
Aggregate (cost=257108.11..257108.11 rows=1 width=8) (actual
time=781.000..781.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..256837.30 rows=108324 width=8) (actual
time=0.000..610.000 rows=96183 loops=1)
-> Index Scan using sord_date on sord h (cost=0.00..138.48
rows=2655 width=4) (actual time=0.000..0.000 rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.000..0.118 rows=36 loops=2646)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 781.000 ms

========== 33% CACHE ROWS ==========

QUERY PLAN (Hash Join on >33% cache rows, indexed + sequential)
Aggregate (cost=43645.62..43645.62 rows=1 width=8) (actual
time=5828.000..5828.000 rows=1 loops=1)
-> Hash Join (cost=484.94..42730.67 rows=365976 width=8) (actual
time=2391.000..5078.000 rows=352856 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=0.000..2234.000 rows=1093398 loops=1)
-> Hash (cost=462.52..462.52 rows=8970 width=4) (actual
time=47.000..47.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..462.52 rows=8970 width=4) (actual time=0.000..0.000
rows=8934 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-09-02'::date))
Total runtime: 5828.000 ms

QUERY PLAN (Merge Join on >33% cache rows, indexed only)
Aggregate (cost=67153.04..67153.04 rows=1 width=8) (actual
time=5985.000..5985.000 rows=1 loops=1)
-> Merge Join (cost=0.00..66238.09 rows=365976 width=8) (actual
time=2953.000..5281.000 rows=352856 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sord_pkey on sord h (cost=0.00..1402.78
rows=8970 width=4) (actual time=31.000..46.000 rows=8934 loops=1)
Filter: ((orderdate >= '2003-01-01'::date) AND (orderdate
<= '2003-09-02'::date))
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual time=0.000..2485.000
rows=994500 loops=1)
Total runtime: 5985.000 ms

QUERY PLAN (Hash Join on >33% cache rows, indexed only)
Aggregate (cost=68946.43..68946.43 rows=1 width=8) (actual
time=6531.000..6531.000 rows=1 loops=1)
-> Hash Join (cost=484.94..68031.48 rows=365976 width=8) (actual
time=3031.000..5765.000 rows=352856 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual time=0.000..3075.000
rows=1093398 loops=1)
-> Hash (cost=462.52..462.52 rows=8970 width=4) (actual
time=46.000..46.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..462.52 rows=8970 width=4) (actual time=0.000..16.000
rows=8934 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-09-02'::date))
Total runtime: 6531.000 ms

QUERY PLAN (Nested Loop on >33% cache rows, indexed only)
Aggregate (cost=868642.40..868642.40 rows=1 width=8) (actual
time=2828.000..2828.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..867727.46 rows=365976 width=8) (actual
time=0.000..2171.000 rows=352856 loops=1)
-> Index Scan using sord_date on sord h (cost=0.00..462.52
rows=8970 width=4) (actual time=0.000..0.000 rows=8934 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-09-02'::date))
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.012..0.125 rows=39 loops=8934)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 2828.000 ms

========== ALL CACHE ROWS ==========

QUERY PLAN (Hash Join on all cache rows, sequential only)
Aggregate (cost=53458.44..53458.44 rows=1 width=8) (actual
time=8906.000..8906.000 rows=1 loops=1)
-> Hash Join (cost=1204.99..50724.94 rows=1093398 width=8) (actual
time=141.000..7089.000 rows=1093397 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=0.000..2629.000 rows=1093398 loops=1)
-> Hash (cost=1137.99..1137.99 rows=26799 width=4) (actual
time=141.000..141.000 rows=0 loops=1)
-> Seq Scan on sord h (cost=0.00..1137.99 rows=26799
width=4) (actual time=0.000..79.000 rows=26799 loops=1)
Total runtime: 8906.000 ms

QUERY PLAN (Merge Join on all cache rows, indexed only)
Aggregate (cost=76156.45..76156.45 rows=1 width=8) (actual
time=9422.000..9422.000 rows=1 loops=1)
-> Merge Join (cost=0.00..73422.95 rows=1093398 width=8) (actual
time=0.000..6835.000 rows=1093397 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sord_pkey on sord h (cost=0.00..1268.79
rows=26799 width=4) (actual time=0.000..94.000 rows=26799 loops=1)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual time=0.000..2773.000
rows=1093398 loops=1)
Total runtime: 9422.000 ms

QUERY PLAN (Nested Loop on all cache rows, Sequential + indexed)
Aggregate (cost=2594934.26..2594934.26 rows=1 width=8) (actual
time=9234.000..9234.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..2592200.76 rows=1093398 width=8) (actual
time=0.000..6966.000 rows=1093397 loops=1)
-> Seq Scan on sord h (cost=0.00..1137.99 rows=26799 width=4)
(actual time=0.000..110.000 rows=26799 loops=1)
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.011..0.104 rows=41 loops=26799)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 9234.000 ms

========== 10% DISK ROWS ==========

QUERY PLAN (Hash Join on <10% disk rows, indexed + sequential)
Aggregate (cost=40085.14..40085.14 rows=1 width=8) (actual
time=12813.000..12813.000 rows=1 loops=1)
-> Hash Join (cost=145.11..39814.32 rows=108324 width=8) (actual
time=11188.000..12592.000 rows=96183 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=31.000..9985.000 rows=1093398 loops=1)
-> Hash (cost=138.48..138.48 rows=2655 width=4) (actual
time=172.000..172.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=47.000..156.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 12813.000 ms

QUERY PLAN (Merge Join on <10% disk rows, indexed only)
Aggregate (cost=63338.43..63338.43 rows=1 width=8) (actual
time=23078.000..23078.000 rows=1 loops=1)
-> Merge Join (cost=289.47..63067.62 rows=108324 width=8) (actual
time=20375.000..22874.000 rows=96183 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual
time=63.000..20657.000 rows=737827 loops=1)
-> Sort (cost=289.47..296.11 rows=2655 width=4) (actual
time=171.000..297.000 rows=96174 loops=1)
Sort Key: h.orderno
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=31.000..171.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 23078.000 ms

QUERY PLAN (Hash Join on <10% disk rows, indexed only)
Aggregate (cost=65385.95..65385.95 rows=1 width=8) (actual
time=30734.000..30734.000 rows=1 loops=1)
-> Hash Join (cost=145.11..65115.13 rows=108324 width=8) (actual
time=19546.000..30593.000 rows=96183 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual
time=47.000..27711.000 rows=1093398 loops=1)
-> Hash (cost=138.48..138.48 rows=2655 width=4) (actual
time=187.000..187.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..138.48 rows=2655 width=4) (actual time=46.000..171.000
rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
Total runtime: 30734.000 ms

QUERY PLAN (Nested Loop on <10% disk rows, indexed only)
Aggregate (cost=257108.11..257108.11 rows=1 width=8) (actual
time=2704.000..2704.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..256837.30 rows=108324 width=8) (actual
time=94.000..2529.000 rows=96183 loops=1)
-> Index Scan using sord_date on sord h (cost=0.00..138.48
rows=2655 width=4) (actual time=32.000..93.000 rows=2646 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-03-16'::date))
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.041..0.814 rows=36 loops=2646)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 2704.000 ms

========== 33% DISK ROWS ==========

QUERY PLAN (Hash Join on >33% disk rows, indexed + sequential)
Aggregate (cost=43645.62..43645.62 rows=1 width=8) (actual
time=13562.000..13562.000 rows=1 loops=1)
-> Hash Join (cost=484.94..42730.67 rows=365976 width=8) (actual
time=8687.000..12985.000 rows=352856 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=31.000..10106.000 rows=1093398 loops=1)
-> Hash (cost=462.52..462.52 rows=8970 width=4) (actual
time=375.000..375.000 rows=0 loops=1)
-> Index Scan using sord_date on sord h
(cost=0.00..462.52 rows=8970 width=4) (actual time=47.000..375.000
rows=8934 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-09-02'::date))
Total runtime: 13562.000 ms

QUERY PLAN (Merge Join on >33% disk rows, indexed only)
Aggregate (cost=67153.04..67153.04 rows=1 width=8) (actual
time=30672.000..30672.000 rows=1 loops=1)
-> Merge Join (cost=0.00..66238.09 rows=365976 width=8) (actual
time=20297.000..29823.000 rows=352856 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sord_pkey on sord h (cost=0.00..1402.78
rows=8970 width=4) (actual time=578.000..670.000 rows=8934 loops=1)
Filter: ((orderdate >= '2003-01-01'::date) AND (orderdate
<= '2003-09-02'::date))
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual
time=47.000..26509.000 rows=994500 loops=1)
Total runtime: 30672.000 ms

QUERY PLAN (Nested Loop on >33% disk rows, indexed only)
Aggregate (cost=868642.40..868642.40 rows=1 width=8) (actual
time=10235.000..10235.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..867727.46 rows=365976 width=8) (actual
time=78.000..9496.000 rows=352856 loops=1)
-> Index Scan using sord_date on sord h (cost=0.00..462.52
rows=8970 width=4) (actual time=32.000..126.000 rows=8934 loops=1)
Index Cond: ((orderdate >= '2003-01-01'::date) AND
(orderdate <= '2003-09-02'::date))
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.035..0.912 rows=39 loops=8934)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 10235.000 ms

========== ALL DISK ROWS ==========

QUERY PLAN (Hash Join on all disk rows, sequential only)
Aggregate (cost=53458.44..53458.44 rows=1 width=8) (actual
time=14281.000..14281.000 rows=1 loops=1)
-> Hash Join (cost=1204.99..50724.94 rows=1093398 width=8) (actual
time=719.000..12096.000 rows=1093397 loops=1)
Hash Cond: ("outer".orderno = "inner".orderno)
-> Seq Scan on sordln l (cost=0.00..33118.98 rows=1093398
width=12) (actual time=16.000..7389.000 rows=1093398 loops=1)
-> Hash (cost=1137.99..1137.99 rows=26799 width=4) (actual
time=703.000..703.000 rows=0 loops=1)
-> Seq Scan on sord h (cost=0.00..1137.99 rows=26799
width=4) (actual time=0.000..657.000 rows=26799 loops=1)
Total runtime: 14281.000 ms

QUERY PLAN (Merge Join on all disk rows, indexed only)
Aggregate (cost=76156.45..76156.45 rows=1 width=8) (actual
time=35235.000..35235.000 rows=1 loops=1)
-> Merge Join (cost=0.00..73422.95 rows=1093398 width=8) (actual
time=94.000..33050.000 rows=1093397 loops=1)
Merge Cond: ("outer".orderno = "inner".orderno)
-> Index Scan using sord_pkey on sord h (cost=0.00..1268.79
rows=26799 width=4) (actual time=47.000..141.000 rows=26799 loops=1)
-> Index Scan using sordln_pkey on sordln l
(cost=0.00..58419.79 rows=1093398 width=12) (actual
time=47.000..28250.000 rows=1093398 loops=1)
Total runtime: 35235.000 ms

QUERY PLAN (Nested Loop on all disk rows, indexed + sequential)
Aggregate (cost=2594934.26..2594934.26 rows=1 width=8) (actual
time=33797.000..33797.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..2592200.76 rows=1093398 width=8) (actual
time=63.000..31744.000 rows=1093397 loops=1)
-> Seq Scan on sord h (cost=0.00..1137.99 rows=26799 width=4)
(actual time=16.000..79.000 rows=26799 loops=1)
-> Index Scan using sordln_pkey on sordln l (cost=0.00..96.01
rows=54 width=12) (actual time=0.039..1.041 rows=41 loops=26799)
Index Cond: ("outer".orderno = l.orderno)
Total runtime: 33797.000 ms

========== ENVIRONMENT ==========

Athlon XP2500, 768MB, 80GB ATA HDD
PostgreSQL 8.0rc2 on Win2k
shared_buffers = 1000
work_mem = 32768
random_page_cost = 2
sord = 27000 rows, 7MB, pkey = int4, Stats = 100 on OrderDate
sordln = 1 million rows, 173MB, pkey = int4 + int2


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Brown <time(at)bigpond(dot)net(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner really hates nested loops
Date: 2005-02-03 16:25:31
Message-ID: 16694.1107447931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

David Brown <time(at)bigpond(dot)net(dot)au> writes:
> I'm hoping someone can shed some light on these results.

Not without a lot more detail on how you *got* the results. What
exactly did you do to force the various plan choices? (I see some
ridiculous choices of indexscans, for instance, suggesting improper use
of enable_seqscan in some cases.) And what's the "cache rows" and "disk
rows" stuff, and how do you know that what you were measuring is
actually what you think it is? I have zero confidence in
Windows-atop-ATA as a platform for measuring disk-related behaviors,
because I don't think you can control or even know what caching is
going on.

regards, tom lane


From: David Brown <time(at)bigpond(dot)net(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner really hates nested loops
Date: 2005-02-03 23:22:39
Message-ID: 4202B23F.5080604@bigpond.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:

>What exactly did you do to force the various plan choices? (I see some
>ridiculous choices of indexscans, for instance, suggesting improper use
>of enable_seqscan in some cases.)
>
Except for forcing a hash with indexes (to show that increased use of
indexes is not necessarily good), the "ridiculous choices of indexscans"
are straight from the planner, i.e. I did not use enable_seqscan.
Obviously, the alternative join methods were obtained by disabling hash
joins and merge joins.

> And what's the "cache rows" and "disk
>rows" stuff, and how do you know that what you were measuring is
>actually what you think it is? I have zero confidence in
>Windows-atop-ATA as a platform for measuring disk-related behaviors,
>because I don't think you can control or even know what caching is
>going on.
>
>
The terms are just abbreviated headings to make it easier to check what
you're looking at. "Cache" refers to repeated runs without disk I/O.
"Disk" refers to a completely initialized system with no PostgreSQL data
in the OS cache (i.e. after a reboot - this is Benchmarking 101). All
results were verified with *at least* two runs at different times. This
is not to say the "disk" results are an accurate or absolute benchmark,
but they're useful as a reference when looking at the cached results.

In any case, I can get the same "cached" results by increasing buffers
to take up most of the memory and thereby make them the defacto cache.

With respect, could we please focus on the point of this thread? I've
spent a great deal of time experimenting with PostgreSQL over the last
couple of months, including reading every known web page regarding
tuning and following every post in this list in that period. I'm
confident that my results here are what most people will experience when
trying PostgreSQL, and I'd like to help in a constructive way.