BUG #4284: Optimizer chooses bad plan with LEFT join

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4284: Optimizer chooses bad plan with LEFT join
Date: 2008-07-05 17:11:25
Message-ID: 200807051711.m65HBPkd083426@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4284
Logged by: David Rowley
Email address: dgrowley(at)gmail(dot)com
PostgreSQL version: 8.3.3
Operating system: Windows XP and 2003
Description: Optimizer chooses bad plan with LEFT join
Details:

I was asked a few days ago to gather certain information from our database.
The schema was never really designed for this query and it’s quite a bodge
job, but it gets the intended results.

I was quite surprised that the query I came up with took over 2 minutes to
run, giving that the biggest of the tables is only around 30k rows. A quick
look at the query plan showed me that the optimizer was choosing to perform
a nested loop for the left join, probably since it thought there was only 1
row in that (derived) table.

I have managed to re-create this and I have included a full re-creation
script including create tables and inserts to generate some mocked up data.

I’ve only tested on 8.3.3, I will test if required on other versions but
hopefully the recreation script will stop the need.

I’ve tuned the number of rows in the mocked up tables so the example query
only takes around 10 seconds (on this computer). That should be enough time
to show that it’s not the best plan.

Let me know if you are unable to re-created and I’ll see what I can do to
provide more information, (settings etc.)

I can get the query to run more quickly in the meantime just by using a FULL
OUTER JOIN and adding a WHERE clause to filter out the NULLs

Anyway… less talk more facts you say…

CREATE TABLE production (
despid SERIAL,
productiondate DATE NOT NULL,
lineid INT NOT NULL,
partcode VARCHAR(20) NOT NULL,
quantity INT NOT NULL,
PRIMARY KEY(despid)
);

CREATE TABLE production_line (
lineid INT PRIMARY KEY,
linename VARCHAR(16) NOT NULL
);

CREATE TABLE batches (
batchid SERIAL,
batchcode VARCHAR(16) NOT NULL,
lineid INT NOT NULL,
partcode VARCHAR(20) NOT NULL,
productiondate DATE NOT NULL,
bestbefore DATE NOT NULL,
PRIMARY KEY(batchid)
);

/* Table to help create mock data */

CREATE TABLE parts (
partcode VARCHAR(20) NOT NULL,
PRIMARY KEY(partcode)
);

/* Create fake parts */

INSERT INTO parts (partcode)
VALUES('PART1'),('PART2'),('PART3'),('PART4'),('PART5'),('PART6'),('PART7');

/* Table to help create mock data */
CREATE TABLE calendar (
date DATE,
PRIMARY KEY(date)
);

/* 200 rows here is ok for testing */

INSERT INTO calendar (date) SELECT '2008-06-30'::DATE + num * '1
day'::INTERVAL FROM (select generate_series(1,200) AS num) AS series;

INSERT INTO production_line (lineid,linename)
VALUES(1,'Line 1'),(2,'Line 2'),(3,'Line 3');

/* Populate production with the cartasian product of calendar,
production_line and parts
* We need a table big enough to get the optimizer not to nest loop. Or at
least to produce
* a query where a nested loop would be slower than a merge join.
*/

INSERT INTO production (productiondate,lineid,partcode,quantity)
SELECT cal.date,
l.lineid,
p.partcode,
CAST(RANDOM() * 1000 AS INT) AS qty
FROM calendar AS cal, production_line AS l, parts AS p;


/* Now to populate the batches table, cartasian product once again. */

INSERT INTO batches (batchcode,lineid,partcode,productiondate,bestbefore)
SELECT 'ABC123',
l.lineid,
p.partcode,
cal.date,
cal.date + '100 days'::INTERVAL
FROM calendar AS cal, production_line AS l, parts AS p;

/* Ensure we have stats for these tables */
ANALYZE;

EXPLAIN ANALYZE SELECT t1.productiondate,
t3.linename,
t1.partcode,
t1.batchcode,
t1.bestbefore,
t4.quantity / t2.number_of_bbdates AS est_packs_in_batch,
t2.number_of_bbdates AS number_of_batches
FROM batches t1
INNER JOIN (SELECT productiondate,
lineid,
partcode,
COUNT(DISTINCT bestbefore) AS number_of_bbdates
FROM batches
GROUP BY productiondate,lineid,partcode
) AS t2 ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
AND t1.partcode = t2.partcode
INNER JOIN production_line t3 ON t1.lineid = t3.lineid
LEFT OUTER JOIN (SELECT productiondate,
lineid,
partcode,
SUM(quantity) AS quantity
FROM production
GROUP BY productiondate,partcode,lineid
) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND
t1.partcode = t4.partcode
;

/*
test=# SELECT VERSION();
version
-----------------------------------------------------
PostgreSQL 8.3.3, compiled by Visual C++ build 1400

EXPLAIN ANALYZE output

QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
-----

-

Nested Loop Left Join (cost=501.52..638.58 rows=1 width=44) (actual
time=155.552..40524.149 rows=4200 loops=1)

Join Filter: ((t1.productiondate = production.productiondate) AND
(t1.lineid = production.lineid) AND ((t1.partcode)::text =
(production.partcode)::text))

-> Hash Join (cost=390.52..510.78 rows=1 width=40) (actual
time=120.452..146.105 rows=4200 loops=1)

Hash Cond: ((t1.productiondate = batches.productiondate) AND
(t1.lineid = batches.lineid) AND ((t1.partcode)::text =
(batches.partcode)::text))

-> Seq Scan on batches t1 (cost=0.00..73.00 rows=4200 width=25)
(actual time=0.020..6.100 rows=4200 loops=1)

-> Hash (cost=390.41..390.41 rows=6 width=85) (actual
time=120.392..120.392 rows=4200 loops=1)

-> Hash Join (cost=326.83..390.41 rows=6 width=85) (actual
time=35.510..106.457 rows=4200 loops=1)

Hash Cond: (batches.lineid = t3.lineid)

-> GroupAggregate (cost=325.76..383.51 rows=420
width=18) (actual time=35.446..90.534 rows=4200 loops=1)

-> Sort (cost=325.76..336.26 rows=4200
width=18) (actual time=35.402..41.801 rows=4200 loops=1)

Sort Key: batches.productiondate,
batches.lineid, batches.partcode

Sort Method: quicksort Memory: 424kB

-> Seq Scan on batches (cost=0.00..73.00
rows=4200 width=18) (actual time=0.011..7.317 rows=4200 loops=1)

-> Hash (cost=1.03..1.03 rows=3 width=11) (actual
time=0.035..0.035 rows=3 loops=1)

-> Seq Scan on production_line t3
(cost=0.00..1.03 rows=3 width=11) (actual time=0.014..0.020 rows=3 loops=1)

-> HashAggregate (cost=111.00..116.25 rows=420 width=18) (actual
time=0.007..5.387 rows=4200 loops=4200)

-> Seq Scan on production (cost=0.00..69.00 rows=4200 width=18)
(actual time=0.024..5.880 rows=4200 loops=1)

Total runtime: 40528.569 ms

(18 rows)

Time: 40532.405 ms

EXPLAIN ANALYZE with SET enable_nestloop = off;

QUERY
PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
-

Hash Left Join (cost=518.32..638.75 rows=1 width=44) (actual
time=171.395..206.840 rows=4200 loops=1)

Hash Cond: ((t1.productiondate = t4.productiondate) AND (t1.lineid =
t4.lineid) AND ((t1.partcode)::text = (t4.partcode)::text))

-> Hash Join (cost=390.52..510.78 rows=1 width=40) (actual
time=115.980..134.411 rows=4200 loops=1)

Hash Cond: ((t1.productiondate = batches.productiondate) AND
(t1.lineid = batches.lineid) AND ((t1.partcode)::text =
(batches.partcode)::text))

-> Seq Scan on batches t1 (cost=0.00..73.00 rows=4200 width=25)
(actual time=0.020..4.160 rows=4200 loops=1)

-> Hash (cost=390.41..390.41 rows=6 width=85) (actual
time=115.916..115.916 rows=4200 loops=1)

-> Hash Join (cost=326.83..390.41 rows=6 width=85) (actual
time=35.429..102.996 rows=4200 loops=1)

Hash Cond: (batches.lineid = t3.lineid)

-> GroupAggregate (cost=325.76..383.51 rows=420
width=18) (actual time=35.366..86.402 rows=4200 loops=1)

-> Sort (cost=325.76..336.26 rows=4200
width=18) (actual time=35.316..40.902 rows=4200 loops=1)

Sort Key: batches.productiondate,
batches.lineid, batches.partcode

Sort Method: quicksort Memory: 424kB

-> Seq Scan on batches (cost=0.00..73.00
rows=4200 width=18) (actual time=0.011..7.131 rows=4200 loops=1)

-> Hash (cost=1.03..1.03 rows=3 width=11) (actual
time=0.035..0.035 rows=3 loops=1)

-> Seq Scan on production_line t3
(cost=0.00..1.03 rows=3 width=11) (actual time=0.007..0.013 rows=3 loops=1)

-> Hash (cost=120.45..120.45 rows=420 width=74) (actual
time=55.356..55.356 rows=4200 loops=1)

-> Subquery Scan t4 (cost=111.00..120.45 rows=420 width=74)
(actual time=22.242..42.829 rows=4200 loops=1)

-> HashAggregate (cost=111.00..116.25 rows=420 width=18)
(actual time=22.236..31.771 rows=4200 loops=1)

-> Seq Scan on production (cost=0.00..69.00 rows=4200
width=18) (actual time=0.024..5.811 rows=4200 loops=1)

Total runtime: 211.055 ms

(20 rows)

Time: 214.725 ms

*/

Browse pgsql-bugs by date

  From Date Subject
Next Message Hiroshi Saito 2008-07-05 17:40:06 Re: BUG #4274: uuid returns duplicate values
Previous Message thomas 2008-07-05 00:32:11 BUG #4281: some types of errors do not log statements