Re: Query performance discontinuity

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: miken(at)bigpond(dot)net(dot)au (Mike Nielsen)
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query performance discontinuity
Date: 2002-11-29 18:24:44
Message-ID: d6cfuugdpi7f16q14ft47spmnhmfmlgcll@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 15 Nov 2002 03:26:32 +0000 (UTC), in
comp.databases.postgresql.performance you wrote:

> -> Seq Scan on ps2 (cost=0.00..13783.40 rows=327895 width=179)
^^^^^
>(actual time=0.15..15211.49 rows=327960 loops=1)
>
> -> Index Scan using ps2_idx on ps2 (cost=0.00..881616.45 rows=327895
^^^^^^
>width=179) (actual time=40.38..2151.38 rows=59629 loops=1)
^^^^
>
>The ps2 table is in time_stamp order, but the tstarts aren't quite as
>good -- they're mostly there, but they're computed by subtracting a
>(stochastic) value from time_stamp.

Mike,

this is the well known "divide correlation by number of index columns"
effect. This effect can be masked to a certain degree by reducing
random_page_cost, as has already been suggested.

The estimated index scan cost is also influenced by
effective_cache_size; its default value is 1000. Try

SET effective_cache_size = 50000;

This should help a bit, but please don't expect a big effect.

I'm running Postgres 7.2 with a modified index cost estimator here.
The patch is at http://www.pivot.at/pg/16-correlation.diff

This patch gives you two new GUC variables.

index_cost_algorithm: allows you to select between different methods
of interpolating between best case and worst case. 0 is the standard
behavior (before the patch), 1 to 4 tend more and more towards lower
index scan costs. See the switch statement in costsize.c for details.
Default = 3.

secondary_correlation: is a factor that is used to reduce the
correlation of the first index column a little bit once for each
additional index column. Default = 0.95.

With default settings you should get an index cost estimate between
20000 and 30000. Which allows you to increase random_page_cost to a
more reasonable value of something like 10 or even higher.

If you try it, please let me know how it works for you.

Servus
Manfred

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Laurette Cisneros 2002-11-30 16:40:04 Re: Low Budget Performance, Part 2
Previous Message David Jericho 2002-11-29 02:58:31 Re: H/W RAID 5 on slower disks versus no raid on faster HDDs