Re: Bad plan after vacuum analyze

Lists: pgsql-performance
From: Guillaume Smet <guillaume_ml(at)smet(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Bad plan after vacuum analyze
Date: 2005-05-11 17:23:32
Message-ID: 42823F94.8050009@smet.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

We have some performances problem on a particular query.

We reproduced the problem on a 7.4.5 and on a 7.4.7 server.
* we load the dump in a new database
* query: it's fast (< 1ms)
* VACUUM FULL ANALYZE;
* query: it's really slow (130ms) and it's another plan
* set enable_seqscan=off;
* query: it's fast (< 1ms) : it uses the best plan

I attached the EXPLAIN ANALYZE outputs, the query and the tables
description. I really can't understand why the planner chooses this plan
and especially the line :
-> Index Scan using acs_objects_object_id_p_hhkb1 on acs_objects t98
(cost=0.00..2554.07 rows=33510 width=81) (actual time=0.043..56.392
rows=33510 loops=1).
I never saw an index scan on such a number of lines. For your
information, there are 33510 lines in this table so it scans the whole
table.

The problem seems to be the left join on the acs_objects t98 table for
the parent_application_id as if I remove it or if I change it to a
subquery, it's ok. The query is automatically generated by a persistence
layer so I can't really rewrite it.

Thanks for any help

Regards

--
Guillaume

Attachment Content-Type Size
plan_after_vacuum.txt text/plain 2.9 KB
plan_after_vacuum_seqscan_off.txt text/plain 2.6 KB
plan_before_vacuum.txt text/plain 2.4 KB
query_section.sql text/x-sql 2.5 KB
tables.txt text/plain 5.0 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Guillaume Smet <guillaume_ml(at)smet(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 18:10:15
Message-ID: 200505111110.15983.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Guillaume,

> We reproduced the problem on a 7.4.5 and on a 7.4.7 server.
> * we load the dump in a new database
> * query: it's fast (< 1ms)
> * VACUUM FULL ANALYZE;
> * query: it's really slow (130ms) and it's another plan
> * set enable_seqscan=off;
> * query: it's fast (< 1ms) : it uses the best plan

Looking at this, the planner seems convinced that the merge join is the
easiest way to do the OUTER JOINS, but it appears to be wrong; a nested loop
is faster.

This isn't the only place I've encountered our optimizer doing this --
underestimating the cost of a merge join. This seems to be becuase the
merge_join vs. nested_loop decision seems to be being made in the planner
without taking the double-sort and index access costs into account. This
query is an excellent example:

"good" plan:
Nested Loop Left Join (cost=2.44..17.36 rows=1 width=5532) (actual
time=0.441..0.466 rows=1 loops=1)
Join Filter: ("outer".parent_application_id = "inner".application_id)
-> Nested Loop Left Join (cost=2.44..15.73 rows=1 width=5214) (actual
time=0.378..0.402 rows=1 loops=1)

See, here the planner thinks that the 2 nested loops will cost "35".

"bad" plan:
Merge Right Join (cost=9.27..9.48 rows=1 width=545) (actual
time=129.364..129.365 rows=1 loops=1)
Merge Cond: ("outer".application_id = "inner".parent_application_id)
-> Index Scan using applicati_applicati_id_p_ogstm on applications t116
(cost=0.00..5.51 rows=28 width=20) (actual time=0.030..0.073 rows=28 loops=1)
-> Sort (cost=9.27..9.27 rows=1 width=529) (actual time=129.202..129.203
rows=1 loops=1)
Sort Key: t22.parent_application_id
-> Merge Right Join (cost=8.92..9.26 rows=1 width=529) (actual
time=129.100..129.103 rows=1 loops=1)
Merge Cond: ("outer".object_id = "inner".parent_application_id)
-> Index Scan using acs_objects_object_id_p_hhkb1 on
acs_objects t98 (cost=0.00..2554.07 rows=33510 width=81) (actual
time=0.043..56.392 rows=33510 loops=1)
-> Sort (cost=8.92..8.93 rows=1 width=452) (actual
time=0.309..0.310 rows=1 loops=1)
Sort Key: t22.parent_application_id

Here the planner chooses a merge right join. This decision seems to have been
made entirely on the basis of the cost of the join itself (total of 17)
without taking the cost of the sort and index access (total of 2600+) into
account.

Tom, is this a possible error in planner logic?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: Guillaume Smet <guillaume_ml(at)smet(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 18:58:46
Message-ID: 13964.1115837926@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> -> Merge Right Join (cost=8.92..9.26 rows=1 width=529) (actual
> time=129.100..129.103 rows=1 loops=1)
> Merge Cond: ("outer".object_id = "inner".parent_application_id)
> -> Index Scan using acs_objects_object_id_p_hhkb1 on
> acs_objects t98 (cost=0.00..2554.07 rows=33510 width=81) (actual
> time=0.043..56.392 rows=33510 loops=1)
> -> Sort (cost=8.92..8.93 rows=1 width=452) (actual
> time=0.309..0.310 rows=1 loops=1)
> Sort Key: t22.parent_application_id

> Here the planner chooses a merge right join. This decision seems to have been
> made entirely on the basis of the cost of the join itself (total of 17)
> without taking the cost of the sort and index access (total of 2600+) into
> account.

> Tom, is this a possible error in planner logic?

No, it certainly hasn't forgotten to add in the costs of the inputs.
There might be a bug here, but if so it's much more subtle than that.

It looks to me like the planner believes that the one value of
t22.parent_application_id joins to something very early in the
acs_objects_object_id_p_hhkb1 sort order, and that it will therefore not
be necessary to run the indexscan to completion (or indeed very far at
all, considering that it's including such a small fraction of the total
indexscan cost).

andrew(at)supernews pointed out recently that this effect doesn't apply to
the outer side of an outer join; releases before 7.4.8 mistakenly think
it does. But unless my wires are totally crossed today, acs_objects is
the nullable side here and so that error isn't applicable anyway.

So, the usual questions: have these two tables been ANALYZEd lately?
If so, can we see the pg_stats rows for the object_id and
parent_application_id columns?

regards, tom lane


From: Guillaume Smet <guillaume_ml(at)smet(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 19:32:11
Message-ID: 42825DBB.7000703@smet.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom,

> So, the usual questions: have these two tables been ANALYZEd lately?

Yes, of course.
As I wrote in my previous mail, here is how I reproduce the problem:
- we load the dump in a new database (to be sure, there is no problem on
an index or something like that)
- query: it's fast (< 1ms)
- *VACUUM FULL ANALYZE;*
- query: it's really slow (130ms) and it's another plan
- set enable_seqscan=off;
- query: it's fast (< 1ms) : it uses the best plan

I reproduced it on two different servers exactly like that (7.4.5 and
7.4.7).

I first met the problem on a production database with a VACUUM ANALYZE
run every night (and we don't have too many inserts a day on this database).

> If so, can we see the pg_stats rows for the object_id and
> parent_application_id columns?

See attached file.

If you're interested in a dump of these tables, just tell me. There
aren't any confidential information in them.

Regards

--
Guillaume

Attachment Content-Type Size
stats.txt text/plain 1.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Smet <guillaume_ml(at)smet(dot)org>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 19:38:16
Message-ID: 14301.1115840296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Guillaume Smet <guillaume_ml(at)smet(dot)org> writes:
>> If so, can we see the pg_stats rows for the object_id and
>> parent_application_id columns?

> See attached file.

Well, those stats certainly appear to justify the planner's belief that
the indexscan needn't run very far: the one value of
parent_application_id is 1031 and this is below the smallest value of
object_id seen by analyze. You might have better luck if you increase
the statistics target for acs_objects.object_id. (It'd be interesting
to know what fraction of acs_objects actually does have object_id < 1032.)

regards, tom lane


From: Guillaume Smet <guillaume_ml(at)smet(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 19:57:02
Message-ID: 4282638E.2020307@smet.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Well, those stats certainly appear to justify the planner's belief that
> the indexscan needn't run very far: the one value of
> parent_application_id is 1031 and this is below the smallest value of
> object_id seen by analyze.

Yes, it seems rather logical but why does it cost so much if it should
be an effective way to find the row?

> You might have better luck if you increase
> the statistics target for acs_objects.object_id.

What do you mean exactly?

> (It'd be interesting
> to know what fraction of acs_objects actually does have object_id <
1032.)

ccm_perf=# SELECT COUNT(*) FROM acs_objects WHERE object_id<1032;
count
-------
15

ccm_perf=# SELECT COUNT(*) FROM acs_objects;
count
-------
33510

--
Guillaume


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Smet <guillaume_ml(at)smet(dot)org>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 20:32:35
Message-ID: 17017.1115843555@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Ah-ha, I can replicate the problem. This example uses tenk1 from the
regression database, which has a column unique2 containing just the
integers 0..9999.

regression=# create table t1(f1 int);
CREATE TABLE
regression=# insert into t1 values(5);
INSERT 154632 1
regression=# insert into t1 values(7);
INSERT 154633 1
regression=# analyze t1;
ANALYZE
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=1.03..1.37 rows=2 width=248) (actual time=0.507..0.617 rows=2 loops=1)
Merge Cond: ("outer".unique2 = "inner".f1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.00..498.24 rows=10024 width=244) (actual time=0.126..0.242 rows=9 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.145..0.153 rows=2 loops=1)
Sort Key: t1.f1
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4) (actual time=0.029..0.049 rows=2 loops=1)
Total runtime: 1.497 ms
(7 rows)

The planner correctly perceives that only a small part of the unique2
index will need to be scanned, and hence thinks the merge is cheap ---
much cheaper than if the whole index had to be scanned. And it is.
Notice that only 9 rows were actually pulled from the index. Once
we got to unique2 = 8, nodeMergejoin.c could see that no more matches
to f1 were possible.

But watch this:

regression=# insert into t1 values(null);
INSERT 154634 1
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=1.03..1.37 rows=2 width=248) (actual time=0.560..290.874 rows=3 loops=1)
Merge Cond: ("outer".unique2 = "inner".f1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.00..498.24 rows=10024 width=244) (actual time=0.139..106.982 rows=10000 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.181..0.194 rows=3 loops=1)
Sort Key: t1.f1
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4) (actual time=0.032..0.067 rows=3 loops=1)
Total runtime: 291.670 ms
(7 rows)

See what happened to the actual costs of the indexscan? All of a sudden
we had to scan the whole index because there was a null in the other
input, and nulls sort high.

I wonder if it is worth fixing nodeMergejoin.c to not even try to match
nulls to the other input. We'd have to add a check to see if the join
operator is strict or not, but it nearly always will be.

The alternative would be to make the planner only believe in the
short-circuit path occuring if it thinks that the other input is
entirely non-null ... but this seems pretty fragile, since it only
takes one null to mess things up, and ANALYZE can hardly be counted
on to detect one null in a table.

In the meantime it seems like the quickest answer for Guillaume might
be to try to avoid keeping any NULLs in parent_application_id.

regards, tom lane


From: Guillaume Smet <guillaume_ml(at)smet(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 20:59:40
Message-ID: 4282723C.5080903@smet.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh, Tom,

Thanks for your explanations.

> In the meantime it seems like the quickest answer for Guillaume might
> be to try to avoid keeping any NULLs in parent_application_id.

I can't do that as the majority of the applications don't have any
parent one. Moreover, we use a third party application and we cannot
modify all its internals.

Anyway, I tried to work on the statistics as you told me and here are
the results:
ccm_perf=# ALTER TABLE acs_objects ALTER COLUMN object_id SET STATISTICS 30;
ALTER TABLE
ccm_perf=# ANALYZE acs_objects;
ANALYZE

ccm_perf=# \i query_section.sql
... correct plan ...
Total runtime: 0.555 ms

So I think I will use this solution for the moment.

Thanks a lot for your help.

Regards

--
Guillaume


From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-11 21:27:45
Message-ID: 1115846865.428278d135357@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Quoting Guillaume Smet <guillaume_ml(at)smet(dot)org>:

> Hi,
>
> We have some performances problem on a particular query.
...

I have to say it, this was the best laid-out set of details behind a
problem I've ever seen on this list; I'm going to try live up to it, the
next time I have a problem of my own.


From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: Guillaume Smet <guillaume_ml(at)smet(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan after vacuum analyze
Date: 2005-05-13 18:35:35
Message-ID: 1116009335.7327.0.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

В Срд, 11/05/2005 в 22:59 +0200, Guillaume Smet пишет:

> Anyway, I tried to work on the statistics as you told me and here are
> the results:
> ccm_perf=# ALTER TABLE acs_objects ALTER COLUMN object_id SET STATISTICS 30;
> ALTER TABLE
> ccm_perf=# ANALYZE acs_objects;
> ANALYZE
>
> ccm_perf=# \i query_section.sql
> ... correct plan ...
> Total runtime: 0.555 ms

Given Tom's analysis, how can increasing the stats target change which
plan is chosen?

--
Markus Bertheau <twanger(at)bluetwanger(dot)de>