Re: Inefficient plan selected by PostgreSQL 9.0.7

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Inefficient plan selected by PostgreSQL 9.0.7
Date: 2012-05-02 05:41:30
Message-ID: CAK-MWwSBpLpELxsNwUA78RSD2mkQFTdnrn1Q07AQ5j6+0n4NRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, May 2, 2012 at 2:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
> > I got very inefficient plan for a simple query.
>
> It looks like the problem is with the estimate of the antijoin size:
>
> > -> Nested Loop Anti Join (cost=0.00..24576.82 rows=1
> width=206)
> > (actual time=0.043..436.386 rows=20761 loops=1)
>
> that is, only about 20% of the rows in sb_messages are eliminated by the
> NOT EXISTS condition, but the planner thinks that nearly all of them
> will be (and that causes it to not think that the LIMIT is going to
> affect anything, so it doesn't prefer a fast-start plan).
>
> Since you've not told us anything about the statistics of these tables,
> it's hard to speculate as to why the estimate is off.
>
> regards, tom lane
>

Most interesting part that NOT EXISTS estimates way off, when LEFT JOIN
WHERE ... IS NULL esimated correctly:

good esitmate (estimated rows=20504 vs real rows=20760):
Game2=# EXPLAIN ANALYZE
SELECT
*
FROM sb_messages messages_tbl
LEFT JOIN users users_tbl ON users_tbl.id = messages_tbl.from_user
WHERE
messages_tbl.type IN (0, 9) AND
messages_tbl.visibility_status = 0 AND
messages_tbl.not_show_on_air = 'f' AND
messages_tbl.clan_id IS NULL AND
users_tbl.blocked IS DISTINCT FROM 't';

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..24577.74 rows=20504 width=1037) (actual
time=0.045..532.012 rows=20760 loops=1)
Filter: (users_tbl.blocked IS DISTINCT FROM true)
-> Index Scan using sb_messages_special3_key on sb_messages
messages_tbl (cost=0.00..3793.75 rows=35784 width=208) (actual
time=0.019..67.746 rows=24937 loops=1)
-> Index Scan using sb_users_pkey on users users_tbl (cost=0.00..0.53
rows=1 width=829) (actual time=0.007..0.009 rows=1 loops=24937)
Index Cond: (users_tbl.id = messages_tbl.from_user)
Total runtime: 563.944 ms

bad estimate (estimated 1 vs real rows=20760):
Game2=# EXPLAIN (ANALYZE, COSTS) SELECT * FROM sb_messages messages_tbl
WHERE
(messages_tbl.type IN (0, 9) AND messages_tbl.visibility_status=0 AND
messages_tbl.not_show_on_air='f' AND messages_tbl.clan_id IS NULL)
AND NOT EXISTS (SELECT 1 FROM users users_tbl WHERE blocked='t' and
users_tbl.id = messages_tbl.from_user);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=0.00..24488.28 rows=1 width=208) (actual
time=0.044..430.645 rows=20760 loops=1)
-> Index Scan using sb_messages_special3_key on sb_messages
messages_tbl (cost=0.00..3793.75 rows=35784 width=208) (actual
time=0.020..67.810 rows=24937 loops=1)
-> Index Scan using sb_users_pkey on users users_tbl (cost=0.00..0.53
rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=24937)
Index Cond: (users_tbl.id = messages_tbl.from_user)
Filter: users_tbl.blocked
Total runtime: 461.296 ms

What is curious that not exists always perform 20% faster (I performed both
explains like 10 times each and each time not exits is close to 20% faster).

--
Maxim Boguk
Senior Postgresql DBA.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chitra Creta 2012-05-02 06:11:38 Re: PostgreSQL 8.3 data corruption
Previous Message Evan Martin 2012-05-02 05:34:20 Re: SQL functions not being inlined