Postgres 8.4 planner question - bad plan, good plan for almost same queries.

From: Дмитрий <fozzy(at)ac-sw(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Postgres 8.4 planner question - bad plan, good plan for almost same queries.
Date: 2012-03-20 08:12:44
Message-ID: 4F683BFC.5030400@ac-sw.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit
under Ubuntu 11.10.

Got large table with geography (PostGIS 1.5) data, running two almost
same queries with same result, but very different performance.

First query:

with uuu as (
select dml.id
from mp_locs12 dml
where (complex conditions, leaving about 100 rows from millions)
)
select label, country, region, parish, city, district,
st_geometrytype(loc::geometry) as gtype,
'0x' || to_hex(type) as n_type, file_name, line
from mp_locs12 dml1
where dml1.id in (select uu.id from uuu uu)
and not exists (
select 1 from mp_locs12 dml2
where dml2.id in (select uu.id from uuu uu)
and dml2.id <> dml1.id
and not st_contains(dml1.loc::geometry, dml2.loc::geometry)
);

Planner choose to seqscan dml2 table in NOT EXISTS condition, very
surprising for me - "dml2.id in (select uu.id from uuu uu)" is a best
cut off here I believe, but planner thinks different (for dml1 it does not).

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=2451523.67..8486747.31 rows=1 width=2130)
Join Filter: ((dml2.id <> dml1.id) AND ((NOT ((dml1.loc)::geometry
&& (dml2.loc)::geometry)) OR (NOT _st_contains((dml1.loc)::geometry,
(dml2.loc)::geometry))))
CTE uuu
-> Seq Scan on mp_locs12 dml (cost=0.00..2451523.62 rows=1 width=4)
Filter: (complex conditions, leaving about 100 rows from
millions)
-> Nested Loop (cost=0.02..8.85 rows=1 width=2134)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4)
-> CTE Scan on uuu uu (cost=0.00..0.02 rows=1 width=4)
-> Index Scan using mp_locs12_pkey on mp_locs12 dml1
(cost=0.00..8.81 rows=1 width=2134)
Index Cond: (dml1.id = uu.id)
-> Seq Scan on mp_locs12 dml2 (cost=0.02..1750366.67 rows=15581266
width=1946)
Filter: (hashed SubPlan 2)
SubPlan 2
-> CTE Scan on uuu uu (cost=0.00..0.02 rows=1 width=4)
(14 rows)

Now try to use bit modified query:

explain with uuu as (
select dml.id
from mp_locs12 dml
where (complex conditions, leaving about 100 rows from millions)
)
select label, country, region, parish, city, district,
st_geometrytype(loc::geometry) as gtype,
'0x' || to_hex(type) as n_type, file_name, line
from mp_locs12 dml1
where dml1.id in (select uu.id from uuu uu)
and lower(st_geometrytype(dml1.loc::geometry)) not in
('st_geometrycollection')
and not exists (
select 1 from (
select dml2.id as id from mp_locs12 dml2
where dml2.id in (select uu.id from uuu uu)
and not st_contains(dml1.loc::geometry,
dml2.loc::geometry)
and lower(st_geometrytype(dml2.loc::geometry)) not in
('st_geometrycollection')
) vv
where vv.id <> dml1.id
);

Only thing I changed - scanned CTE for ids to get, then compared taken
ids with dml1.id. And now planner chose best plan (I think so):

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2139898.32..2139916.30 rows=1 width=2130)
CTE uuu
-> Seq Scan on mp_locs12 dml (cost=0.00..2139898.30 rows=1 width=4)
Filter: (complex conditions, leaving about 100 rows from
millions)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4)
-> CTE Scan on uuu uu (cost=0.00..0.02 rows=1 width=4)
-> Index Scan using mp_locs12_pkey on mp_locs12 dml1
(cost=0.00..17.95 rows=1 width=2134)
Index Cond: (dml1.id = uu.id)
Filter: ((lower(st_geometrytype((dml1.loc)::geometry)) <>
'st_geometrycollection'::text) AND (NOT (SubPlan 2)))
SubPlan 2
-> Nested Loop (cost=0.02..9.13 rows=1 width=0)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4)
-> CTE Scan on uuu uu (cost=0.00..0.02 rows=1
width=4)
-> Index Scan using mp_locs12_pkey on mp_locs12 dml2
(cost=0.00..9.08 rows=1 width=4)
Index Cond: (dml2.id = uu.id)
Filter: ((dml2.id <> $2) AND
(lower(st_geometrytype((dml2.loc)::geometry)) <>
'st_geometrycollection'::text) AND ((NOT (($1)::geometry &&
(dml2.loc)::geometry)) OR (NOT _st_contains(($1)::geometry,
(dml2.loc)::geometry))))
(16 rows)

What makes planner o choose so differently in almost same cases? Is it a
bug, or I misunderstood something?

Regards
Dmitry

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-03-20 08:35:58 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
Previous Message Noah Misch 2012-03-20 05:07:52 Re: [PATCH] Support for foreign keys with arrays