anti-join chosen even when slower than old plan

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: anti-join chosen even when slower than old plan
Date: 2010-11-09 21:18:13
Message-ID: 4CD9663502000025000374B1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The semi-join and anti-join have helped us quite a bit, but we have
seen a situation where anti-join is chosen even though it is slower
than the "old fashioned" plan. I know there have been other reports
of this, but I just wanted to go on record with my details.

The query:

delete from "DbTranLogRecord"
where not exists
(select * from "DbTranRepository" r
where r."countyNo" = "DbTranLogRecord"."countyNo"
and r."tranImageSeqNo"
= "DbTranLogRecord"."tranImageSeqNo");

Old plan on 8.3.7:

Seq Scan on "DbTranLogRecord" (cost=0.00..1224227790.06
rows=333387520 width=6)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using "DbTranRepositoryPK" on "DbTranRepository"
r (cost=0.00..1.83 rows=1 width=974)
Index Cond: ((("countyNo")::smallint = ($0)::smallint)
AND (("tranImageSeqNo")::numeric = ($1)::numeric))

Deletes about 9.2 million rows in 7 hours and 20 minutes.

New plan on 9.0.1:

Delete (cost=0.00..93918390.38 rows=1 width=12)
-> Merge Anti Join (cost=0.00..93918390.38 rows=1 width=12)
Merge Cond: ((("DbTranLogRecord"."countyNo")::smallint =
(r."countyNo")::smallint) AND
(("DbTranLogRecord"."tranImageSeqNo")::numeric =
(r."tranImageSeqNo")::numeric))
-> Index Scan using "DbTranLogRecordPK" on
"DbTranLogRecord" (cost=0.00..73143615.91 rows=675405504 width=20)
-> Index Scan using "DbTranRepositoryPK" on
"DbTranRepository" r (cost=0.00..16328700.43 rows=152541168
width=20)

Cancelled after 39 hours and 25 minutes.

I know how to work around it by using OFFSET 0 or tweaking the
costing for that one query; just sharing the information.

Also, we know these tables might be good candidates for
partitioning, but that's an issue for another day.

Table "public.DbTranLogRecord"
Column | Type | Modifiers
----------------+-------------------+-----------
countyNo | "CountyNoT" | not null
tranImageSeqNo | "TranImageSeqNoT" | not null
logRecordSeqNo | "LogRecordSeqNoT" | not null
operation | "OperationT" | not null
tableName | "TableNameT" | not null
Indexes:
"DbTranLogRecordPK" PRIMARY KEY, btree ("countyNo",
"tranImageSeqNo", "logRecordSeqNo")
"DbTranLogRecord_TableNameSeqNo" btree ("countyNo", "tableName",
"tranImageSeqNo", operation)

Table "public.DbTranRepository"
Column | Type | Modifiers
------------------+------------------------+-----------
countyNo | "CountyNoT" | not null
tranImageSeqNo | "TranImageSeqNoT" | not null
timestampValue | "TimestampT" | not null
transactionImage | "ImageT" |
status | character(1) | not null
queryName | "QueryNameT" |
runDuration | numeric(15,0) |
userId | "UserIdT" |
functionalArea | "FunctionalAreaT" |
sourceRef | character varying(255) |
url | "URLT" |
tranImageSize | numeric(15,0) |
Indexes:
"DbTranRepositoryPK" PRIMARY KEY, btree ("countyNo",
"tranImageSeqNo") CLUSTER
"DbTranRepository_UserId" btree ("countyNo", "userId",
"tranImageSeqNo")
"DbTranRepository_timestamp" btree ("countyNo", "timestampValue")

relname | relpages | reltuples |
pg_relation_size
--------------------------------+----------+-------------+------------------
DbTranLogRecord | 5524411 | 6.75406e+08 | 42 GB
DbTranLogRecordPK | 6581122 | 6.75406e+08 | 50 GB
DbTranLogRecord_TableNameSeqNo | 6803441 | 6.75406e+08 | 52 GB
DbTranRepository | 22695447 | 1.52376e+08 | 173 GB
DbTranRepositoryPK | 1353643 | 1.52376e+08 | 10 GB
DbTranRepository_UserId | 1753793 | 1.52376e+08 | 13 GB
DbTranRepository_timestamp | 1353682 | 1.52376e+08 | 10 GB
(7 rows)

oprofile while not much but this delete is running:

samples % symbol name
2320174 33.7617 index_getnext
367268 5.3443 LWLockAcquire
299131 4.3528 hash_search_with_hash_value
249459 3.6300 HeapTupleSatisfiesMVCC
229558 3.3404 PinBuffer
222673 3.2402 _bt_checkkeys
204416 2.9745 LWLockRelease
194336 2.8279 heap_page_prune_opt
152353 2.2169 XidInMVCCSnapshot
121131 1.7626 AllocSetAlloc
91123 1.3260 SearchCatCache
88394 1.2863 nocache_index_getattr
85936 1.2505 pglz_compress
76531 1.1136 heap_hot_search_buffer
69532 1.0118 _mdfd_getseg
68743 1.0003 FunctionCall2
64720 0.9418 TransactionIdPrecedes
45298 0.6591 texteq
43183 0.6284 UnpinBuffer
40666 0.5917 base_yyparse

If you want more details or the opannotate level, let me know.

-Kevin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bricklen 2010-11-09 21:26:47 Huge overestimation in rows expected results in bad plan
Previous Message Bob Lunney 2010-11-09 16:02:42 Re: out of memory problem