Why will hashed SubPlan not use multiple batches

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Why will hashed SubPlan not use multiple batches
Date: 2013-01-25 20:50:48
Message-ID: CAMkU=1zoV1wqLrrs7a_PAC8aAe8uDbiCachNQ1FmdBYAKcBKcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A hashed SubPlan will not be used if it would need more than one
batch. Is there a fundamental reason for that, or just that no one
got around to adding it?

A small decrease in work_mem leads to a 38000 fold change in estimated
query execution (and that might be accurate, as the actual change in
execution is too large to measure)

I have no control over the real query itself (otherwise changing it
from NOT IN to NOT EXISTS would fix it, because that hash plan will
use multiple batches).

I have temporarily fixed it by increasing work_mem, but it would be
better if the planner did the best with the resources it had.

This example works with default settings on "PostgreSQL 9.2.2 on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red
Hat 4.4.6-4), 64-bit". Same behavior on 9.1.7 and 9.3dev.

Is this a Todo item?

test case below.

create table foo as select (random()*10000)::integer as bar from
generate_series(1,100000);
create table foo2 as select (random()*10000)::integer as bar2 from
generate_series(1,100000);
analyze;

set work_mem TO 3300;
explain select foo.bar from foo where bar not in (select bar2 from foo2);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on foo (cost=1693.00..3386.00 rows=50000 width=4)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on foo2 (cost=0.00..1443.00 rows=100000 width=4)
(4 rows)

set work_mem TO 3100;
explain select foo.bar from foo where bar not in (select bar2 from foo2);
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..129201693.00 rows=50000 width=4)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..2334.00 rows=100000 width=4)
-> Seq Scan on foo2 (cost=0.00..1443.00 rows=100000 width=4)

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2013-01-25 21:09:39 Re: replace plugins directory with GUC
Previous Message Bruce Momjian 2013-01-25 20:40:57 Re: setting per-database/role parameters checks them against wrong context