MIN() performance regression 8.0 -> 8.1

Lists: pgsql-hackers
From: Paul Lindner <lindner(at)inuus(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: MIN() performance regression 8.0 -> 8.1
Date: 2005-12-04 15:57:15
Message-ID: 20051204155715.GC10317@inuus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I believe I've found a performance regression between 8.0 and 8.1 for
some cases. For some queries it appears that the old MIN() ran much
faster than the new optimized MIN().

The following set of query plans illustrates the problem. (schema
names changed to protect the innocent...)

Basically it appears that the planner favors a Filter instead of an
Index Cond unless I significantly increase the cpu_tuple_cost. Also
note that the filter adds a IS NOT NULL condition for a column already
defined as IS NOT NULL.

[[Note: Results were obtained by running three hours worth of SELECT
traffic through an 8.0 and 8.1 instance, then analyzing the before and after
for speedups and regressions, scripts available on request.]]

*** The table:

typepad_20051127=# \d mm_moo_summary;
Table "public.moo_summary"
Column | Type | Modifiers
-----------------------+--------------------------+-----------
moo_summary_id | integer | not null
moo_summary_b_id | integer |
moo_summary_set_id | integer |
moo_summary_hits | integer |
moo_summary_date | timestamp with time zone | not null
Indexes:
"mm_moo_summary_pkey" PRIMARY KEY, btree (moo_summary_id)
"mm_moo_summary_b_id" btree (moo_summary_b_id)
"mm_moo_summary_date" btree (moo_summary_date)
"mm_moo_summary_set_id" btree (moo_summary_set_id)

*** 8.0 Query Plans:

explain SELECT MIN(moo_summary_date) FROM mm_moo_summary WHERE (moo_summary_b_id = '215222')

Aggregate (cost=5730.96..5730.96 rows=1 width=8)
-> Index Scan using mm_moo_summary_b_id on mm_moo_summary (cost=0.00..5725.62 rows=2137 width=8)
Index Cond: (moo_summary_b_id = 215222)

explain SELECT moo_summary_date FROM mm_moo_summary WHERE (moo_summary_b_id = '215222') order by moo_summary_date limit 1

Limit (cost=0.00..195.96 rows=1 width=8)
-> Index Scan using mm_moo_summary_date on mm_moo_summary (cost=0.00..418775.56 rows=2137 width=8)
Filter: (moo_summary_b_id = 21522)

explain SELECT moo_summary_date FROM mm_moo_summary WHERE (moo_summary_b_id = '215222') order by moo_summary_date

Sort (cost=5843.81..5849.15 rows=2137 width=8)
Sort Key: moo_summary_date
-> Index Scan using mm_moo_summary_b_id on mm_moo_summary (cost=0.00..5725.62 rows=2137 width=8)
Index Cond: (moo_summary_b_id = 21522)

*** 8.1 Query Plan:

explain SELECT MIN(moo_summary_date) FROM mm_moo_summary WHERE (moo_summary_b_id = '215222')

Result (cost=1988.44..1988.45 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..1988.44 rows=1 width=8)
-> Index Scan using mm_moo_summary_date on mm_moo_summary (cost=0.00..4239343.73 rows=2132 width=8)
Filter: ((moo_summary_b_id = 215222) AND (moo_summary_date IS NOT NULL))

typepad_20051127=# set cpu_tuple_cost = .55;

SET cpu_tuple_cost = .55;
EXPLAIN SELECT MIN(stats_summary_date) from mm_moo_summary where moo_summary_b_id = 21522;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=5524.70..5525.25 rows=1 width=8)
-> Index Scan using mm_moo_summary_blog_id on mm_moo_summary (cost=0.00..5519.37 rows=2132 width=8)
Index Cond: (moo_summary_b_id = 21522)
(3 rows)

--
Paul Lindner ||||| | | | | | | | | |
lindner(at)inuus(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paul Lindner <lindner(at)inuus(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MIN() performance regression 8.0 -> 8.1
Date: 2005-12-05 03:05:35
Message-ID: 12394.1133751935@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Lindner <lindner(at)inuus(dot)com> writes:
> I believe I've found a performance regression between 8.0 and 8.1 for
> some cases. For some queries it appears that the old MIN() ran much
> faster than the new optimized MIN().

When you are complaining about planner mistakes, showing EXPLAIN rather
than EXPLAIN ANALYZE output is an entirely ineffective way of making
your point.

I suspect the problem is that there is a correlation between the
moo_summary_date and moo_summary_b_id columns, which the planner
will not realize --- is that the case?

regards, tom lane