Re: Bad plan after vacuum analyze

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Smet <guillaume_ml(at)smet(dot)org>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 20:32:35
Message-ID: 17017.1115843555@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ah-ha, I can replicate the problem. This example uses tenk1 from the
regression database, which has a column unique2 containing just the
integers 0..9999.

regression=# create table t1(f1 int);
CREATE TABLE
regression=# insert into t1 values(5);
INSERT 154632 1
regression=# insert into t1 values(7);
INSERT 154633 1
regression=# analyze t1;
ANALYZE
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=1.03..1.37 rows=2 width=248) (actual time=0.507..0.617 rows=2 loops=1)
Merge Cond: ("outer".unique2 = "inner".f1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.00..498.24 rows=10024 width=244) (actual time=0.126..0.242 rows=9 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.145..0.153 rows=2 loops=1)
Sort Key: t1.f1
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4) (actual time=0.029..0.049 rows=2 loops=1)
Total runtime: 1.497 ms
(7 rows)

The planner correctly perceives that only a small part of the unique2
index will need to be scanned, and hence thinks the merge is cheap ---
much cheaper than if the whole index had to be scanned. And it is.
Notice that only 9 rows were actually pulled from the index. Once
we got to unique2 = 8, nodeMergejoin.c could see that no more matches
to f1 were possible.

But watch this:

regression=# insert into t1 values(null);
INSERT 154634 1
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=1.03..1.37 rows=2 width=248) (actual time=0.560..290.874 rows=3 loops=1)
Merge Cond: ("outer".unique2 = "inner".f1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.00..498.24 rows=10024 width=244) (actual time=0.139..106.982 rows=10000 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.181..0.194 rows=3 loops=1)
Sort Key: t1.f1
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4) (actual time=0.032..0.067 rows=3 loops=1)
Total runtime: 291.670 ms
(7 rows)

See what happened to the actual costs of the indexscan? All of a sudden
we had to scan the whole index because there was a null in the other
input, and nulls sort high.

I wonder if it is worth fixing nodeMergejoin.c to not even try to match
nulls to the other input. We'd have to add a check to see if the join
operator is strict or not, but it nearly always will be.

The alternative would be to make the planner only believe in the
short-circuit path occuring if it thinks that the other input is
entirely non-null ... but this seems pretty fragile, since it only
takes one null to mess things up, and ANALYZE can hardly be counted
on to detect one null in a table.

In the meantime it seems like the quickest answer for Guillaume might
be to try to avoid keeping any NULLs in parent_application_id.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2005-05-11 20:59:40 Re: Bad plan after vacuum analyze
Previous Message Simon Riggs 2005-05-11 20:22:13 Re: Partitioning / Clustering