bad estimation together with large work_mem generates terrible slow hash joins

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-06-26 12:10:55
Message-ID: CAFj8pRA4BEtna-WEV6NUwN8_0bJoquv3677zvms7+=0No7c6ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

today I had to work with one slow query - when I checked different
scenarios I found a dependency on work_mem size - but it is atypical -
bigger work_mem increased query execution 31 minutes (600MB work mem) and 1
minute (1MB).

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '600MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" = "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2145957.37..2145957.96 rows=59 width=12) (actual
time=1864088.145..1864088.155 rows=31 loops=1)
-> Hash Join (cost=882573.27..2142753.08 rows=213619 width=12) (actual
time=9083.326..1859069.171 rows=7070141 loops=1)
Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id =
f_users_aax8qg0e23asx5h.id)
-> Seq Scan on f_emails_aaekn159p5aw3t8 (cost=0.00..854559.95
rows=32278695 width=12) (actual time=0.006..14271.239 rows=32211565 loops=1)
-> Hash (cost=881801.58..881801.58 rows=61735 width=8) (actual
time=9076.153..9076.153 rows=3310768 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 129327kB
-> Hash Join (cost=782.15..881801.58 rows=61735 width=8)
(actual time=1.423..8086.228 rows=3310768 loops=1)
Hash Cond:
(f_users_aax8qg0e23asx5h.dt_registrationdatetime_id =
dwh_dm_abnblk9j5oagorw.id)
-> Seq Scan on f_users_aax8qg0e23asx5h
(cost=0.00..845420.42 rows=9328442 width=12) (actual time=0.015..4098.915
rows=9322096 loops=1)
-> Hash (cost=777.59..777.59 rows=365 width=4)
(actual time=1.400..1.400 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw
(cost=11.08..777.59 rows=365 width=4) (actual time=0.111..1.218 rows=365
loops=1)
Recheck Cond: (2014 = id_year)
-> Bitmap Index Scan on
dwh_dm_abnblk9j5oagorw_id_year_idx (cost=0.00..10.99 rows=365 width=0)
(actual time=0.068..0.068 rows=365 loops=1)
Index Cond: (2014 = id_year)
Total runtime: 1864107.373 ms
(16 rows)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '1MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" = "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2275675.45..2275675.95 rows=50 width=12) (actual
time=48438.407..48438.416 rows=31 loops=1)
-> Hash Join (cost=882772.88..2272526.68 rows=209918 width=12) (actual
time=9384.610..45211.963 rows=6988804 loops=1)
Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id =
f_users_aax8qg0e23asx5h.id)
-> Seq Scan on f_emails_aaekn159p5aw3t8 (cost=0.00..839754.64
rows=31719464 width=12) (actual time=0.034..14299.338 rows=31653392 loops=1)
-> Hash (cost=881759.44..881759.44 rows=61715 width=8) (actual
time=9368.371..9368.371 rows=3310768 loops=1)
Buckets: 4096 Batches: 256 (originally 4) Memory Usage:
1025kB
-> Hash Join (cost=782.15..881759.44 rows=61715 width=8)
(actual time=3.839..8223.097 rows=3310768 loops=1)
Hash Cond:
(f_users_aax8qg0e23asx5h.dt_registrationdatetime_id =
dwh_dm_abnblk9j5oagorw.id)
-> Seq Scan on f_users_aax8qg0e23asx5h
(cost=0.00..845389.92 rows=9325392 width=12) (actual time=0.017..4118.598
rows=9322096 loops=1)
-> Hash (cost=777.59..777.59 rows=365 width=4)
(actual time=3.804..3.804 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw
(cost=11.08..777.59 rows=365 width=4) (actual time=0.188..3.641 rows=365
loops=1)
Recheck Cond: (2014 = id_year)
-> Bitmap Index Scan on
dwh_dm_abnblk9j5oagorw_id_year_idx (cost=0.00..10.99 rows=365 width=0)
(actual time=0.115..0.115 rows=365 loops=1)
Index Cond: (2014 = id_year)
Total runtime: 48439.958 ms
(16 rows)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '10MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" = "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2124026.77..2124027.27 rows=50 width=12) (actual
time=100739.984..100739.998 rows=31 loops=1)
-> Hash Join (cost=882530.88..2120878.00 rows=209918 width=12) (actual
time=9467.702..97238.959 rows=6988804 loops=1)
Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id =
f_users_aax8qg0e23asx5h.id)
-> Seq Scan on f_emails_aaekn159p5aw3t8 (cost=0.00..839754.64
rows=31719464 width=12) (actual time=0.015..9185.978 rows=31653392 loops=1)
-> Hash (cost=881759.44..881759.44 rows=61715 width=8) (actual
time=9466.440..9466.440 rows=3310768 loops=1)
Buckets: 8192 Batches: 16 (originally 1) Memory Usage:
10241kB
-> Hash Join (cost=782.15..881759.44 rows=61715 width=8)
(actual time=3.681..8183.423 rows=3310768 loops=1)
Hash Cond:
(f_users_aax8qg0e23asx5h.dt_registrationdatetime_id =
dwh_dm_abnblk9j5oagorw.id)
-> Seq Scan on f_users_aax8qg0e23asx5h
(cost=0.00..845389.92 rows=9325392 width=12) (actual time=0.012..4064.044
rows=9322096 loops=1)
-> Hash (cost=777.59..777.59 rows=365 width=4)
(actual time=3.654..3.654 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw
(cost=11.08..777.59 rows=365 width=4) (actual time=0.129..3.449 rows=365
loops=1)
Recheck Cond: (2014 = id_year)
-> Bitmap Index Scan on
dwh_dm_abnblk9j5oagorw_id_year_idx (cost=0.00..10.99 rows=365 width=0)
(actual time=0.077..0.077 rows=365 loops=1)
Index Cond: (2014 = id_year)
Total runtime: 100740.552 ms
(16 rows)

The basic problem is wrong estimation - but I am not able to fix it - I
changed a statistics to 10000 without any change. I cannot to change SQL in
application, but for debugging I can test it with fixed estimation:

db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"stehule"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388
= "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN stehule on
"f_emails_aaekn159p5aw3t8"."userid_id" = stehule."id" ) ) GROUP BY 1 ;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2073456.55..2073457.23 rows=68 width=12) (actual
time=58724.094..58724.106 rows=31 loops=1)
-> Hash Join (cost=89142.28..1589276.13 rows=32278695 width=12)
(actual time=2191.019..55499.331 rows=7070141 loops=1)
Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = stehule.id)
-> Seq Scan on f_emails_aaekn159p5aw3t8 (cost=0.00..854559.95
rows=32278695 width=12) (actual time=0.018..8364.554 rows=32211565 loops=1)
-> Hash (cost=47757.68..47757.68 rows=3310768 width=8) (actual
time=2188.858..2188.858 rows=3310768 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 129327kB
-> Seq Scan on stehule (cost=0.00..47757.68 rows=3310768
width=8) (actual time=0.026..927.235 rows=3310768 loops=1)
Total runtime: 58741.224 ms
(8 rows)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '1MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"stehule"."firtsclickutmsource_id" AS "a_5078", COUNT( ( CASE WHEN ( 89388
= "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN stehule on
"f_emails_aaekn159p5aw3t8"."userid_id" = stehule."id" ) ) GROUP BY 1 ;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2293499.45..2293500.13 rows=68 width=12) (actual
time=37357.967..37357.976 rows=31 loops=1)
-> Hash Join (cost=102075.28..1809319.02 rows=32278695 width=12)
(actual time=1814.232..34290.818 rows=7070141 loops=1)
Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id = stehule.id)
-> Seq Scan on f_emails_aaekn159p5aw3t8 (cost=0.00..854559.95
rows=32278695 width=12) (actual time=0.007..14066.754 rows=32211565 loops=1)
-> Hash (cost=47757.68..47757.68 rows=3310768 width=8) (actual
time=1806.453..1806.453 rows=3310768 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 518kB
-> Seq Scan on stehule (cost=0.00..47757.68 rows=3310768
width=8) (actual time=0.007..781.672 rows=3310768 loops=1)
Total runtime: 37359.264 ms
(8 rows)

Still less work_mem performs better - but with our default 600MB it
performs relatively well

db_kost07e2d9cdmg20b1takpqntobo6ghj=# select version();

version
---------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.8 on x86_64-gdc-linux-gnu, with GoodData patches, compiled
by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ stehule
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+----------+--------+-------------
public | stehule | table | postgres | 114 MB |
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ f_users_aax8qg0e23asx5h
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------------------+-------+-------+---------+-------------
public | f_users_aax8qg0e23asx5h | table | beard | 5878 MB |
(1 row)

db_kost07e2d9cdmg20b1takpqntobo6ghj=# \dt+ f_emails_aaekn159p5aw3t8
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------------------+-------+-------+---------+-------------
public | f_emails_aaekn159p5aw3t8 | table | beard | 4156 MB |
(1 row)

all data are in FS cache. There are 60GB RAM.

Is interesting so 31M nested loops is significantly faster, than bad hash
join

db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" = "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=40076389.70..40076448.70 rows=59 width=12) (actual
time=31124.643..31124.658 rows=31 loops=1)
-> Nested Loop (cost=2773.92..40073185.42 rows=213619 width=12)
(actual time=37.945..27670.473 rows=7070141 loops=1)
-> Hash Join (cost=2773.92..10180068.58 rows=61735 width=8)
(actual time=0.951..8934.170 rows=3310768 loops=1)
Hash Cond:
(f_users_aax8qg0e23asx5h.dt_registrationdatetime_id =
dwh_dm_abnblk9j5oagorw.id)
-> Seq Scan on f_users_aax8qg0e23asx5h
(cost=0.00..10080578.00 rows=9328442 width=12) (actual time=0.004..4297.579
rows=9322096 loops=1)
-> Hash (cost=2408.01..2408.01 rows=365 width=4) (actual
time=0.919..0.919 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw
(cost=386.27..2408.01 rows=365 width=4) (actual time=0.104..0.733 rows=365
loops=1)
Recheck Cond: (2014 = id_year)
-> Bitmap Index Scan on
dwh_dm_abnblk9j5oagorw_id_year_idx (cost=0.00..386.18 rows=365 width=0)
(actual time=0.059..0.059 rows=365 loops=1)
Index Cond: (2014 = id_year)
-> Index Scan using f_emails_aaekn159p5aw3t8_userid_id_idx on
f_emails_aaekn159p5aw3t8 (cost=0.00..396.22 rows=88 width=12) (actual
time=0.002..0.004 rows=2 loops=3310768)
Index Cond: (userid_id = f_users_aax8qg0e23asx5h.id)
Total runtime: 31125.975 ms

Regards

Pavel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2014-06-26 12:12:18 Re: "RETURNING PRIMARY KEY" syntax extension
Previous Message Michael Glaesemann 2014-06-26 12:04:48 Re: sorting a union over inheritance vs pathkeys