IN vs EXISTS equivalence

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: IN vs EXISTS equivalence
Date: 2007-10-22 14:31:23
Message-ID: 471C6DEB.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've requested this before without response, but I'm asking again
because it just caused me pain again: could we get a TODO added to
have the planner recognize equivalent IN and EXISTS constructs and
have them compete on cost estimates? I know it's not a trivial
improvement, but if it's on the list maybe someone will pick it up,
and I see it as the single biggest weakness in PostgreSQL
performance.

I don't need help resolving this particular case, because the fix is
always blinding obvious when we hit this, and it doesn't even break
portability because no other database we've tested fails to recognize
these equivalent cases.

step=# explain DELETE FROM "Body" WHERE "bodySeqNo" NOT IN (SELECT "bodySeqNo" FROM "Message");
QUERY PLAN
----------------------------------------------------------------------------------
Seq Scan on "Body" (cost=90277.43..285235351699.39 rows=3313379 width=6)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=90277.43..159793.40 rows=6627957 width=11)
-> Seq Scan on "Message" (cost=0.00..80413.07 rows=6627957 width=11)
(5 rows)

step=# explain DELETE FROM "Body" WHERE NOT EXISTS (SELECT * FROM "Message" m WHERE m."bodySeqNo" = "Body"."bodySeqNo");
QUERY PLAN
--------------------------------------------------------------------------------------------
Seq Scan on "Body" (cost=0.00..3401760.88 rows=3313416 width=6)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using "Message_Body" on "Message" m (cost=0.00..0.49 rows=1 width=136)
Index Cond: (("bodySeqNo")::numeric = ($0)::numeric)
(5 rows)

The bodySeqNo column is NOT NULL in both tables, and is the primary
key in the Body table. The Message table has a non-unique index on
it. (\d lists will follow at the bottom.)

I cancelled the first query after it had been running for 54 hours
over our slowest hours (the weekend). The second form ran in four
minutes in competition with peak time queries.

-Kevin


step=# \d "Body"
Table "public.Body"
Column | Type | Modifiers
-------------+------------------------+-----------
bodySeqNo | "SequenceT" | not null
contentType | character varying(255) | not null
encoding | character varying(255) |
body | "BodyT" |
Indexes:
"Body_pkey" PRIMARY KEY, btree ("bodySeqNo")

step=# \d "Message"
Table "public.Message"
Column | Type | Modifiers
-----------------+--------------------------+-----------
messageId | "SequenceT" | not null
clientMessageId | "ClientMessageIdT" | not null
correlationId | "SequenceT" |
destQueue | "QueueNameT" | not null
replyToQueue | "QueueNameT" | not null
typeCode | character(2) |
expiration | timestamp with time zone |
priority | smallint | not null
status | character(2) | not null
created | timestamp with time zone | not null
lastModified | timestamp with time zone | not null
bodySeqNo | "SequenceT" | not null
messageIdSearch | "PrioritySequenceT" | not null
Indexes:
"Message_pkey" PRIMARY KEY, btree ("messageId")
"MessageIndex2" UNIQUE, btree ("destQueue", "clientMessageId")
"Message_MessageIdSearch" UNIQUE, btree ("destQueue", status, "messageIdSearch") CLUSTER
"Message_Body" btree ("bodySeqNo")
"Message_Created" btree ("destQueue", status, created)
"Message_Created2" btree ("destQueue", created)
"Message_Expiration" btree (expiration)
"Message_LastModified" btree ("destQueue", "lastModified")
"Message_ReplyToQueue" btree ("replyToQueue")
Foreign-key constraints:
"Message_fk1" FOREIGN KEY ("destQueue") REFERENCES "Queue"(name)
"Message_fk2" FOREIGN KEY ("replyToQueue") REFERENCES "Queue"(name)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-10-22 14:33:17 Re: 8.2.3: Server crashes on Windows using Eclipse/Junit
Previous Message Magnus Hagander 2007-10-22 14:31:14 Re: Ready for beta2?