Re: Cluster vs. non-cluster query planning

From: Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Cluster vs. non-cluster query planning
Date: 2006-05-02 00:01:59
Message-ID: 4456A177.3020505@rbsinteractive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:

>Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com> writes:
>
>>But, I'm guessing that random_page_cost = 1 is not a realistic value.
>>
>
>Well, that depends. If all your data can be expected to fit in memory
>then it is a realistic value. (If not, you should be real careful not
>to make performance decisions on the basis of test cases that *do* fit
>in RAM...)
>
>In any case, if I recall your numbers correctly you shouldn't need to
>drop it nearly that far to get the thing to make the right choice.
>A lot of people run with random_page_cost set to 2 or so.
>
Thanks for the advice. I will check what changing random_page_cost does
for the rest of the queries on our system.

I did learn why the estimated row count was so high. This is new
knowledge to me, so I'm going to share it.

SELECT reltuples FROM pg_class WHERE relname = 'orders'; -> produces 98426.
SELECT n_distinct FROM pg_stats WHERE tablename = 'orders' and attname =
'order_statuses_id'; -> currently 13.

Seq Scan on orders o (cost=1.20..11395.53 rows=7570 width=8) (actual
time=283.599..285.031 rows=105 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.031..0.038 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 285.225 ms

(98426 / 13)::integer = 7571 ~= 7570, the estimated row count.

So the query planner isn't able to combine the knowledge of the id value
from order_statuses with most_common_vals, most_common_freqs, or
histogram_bounds from pg_stats. That seems a little odd to me, but maybe
it makes sense. I suppose the planner can't start executing parts of the
query to aid in the planning process.

In the future, I will probably pre-select from order_statuses before
executing this query.

Thanks!

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Splivalo 2006-05-02 01:27:54 Lot'sa joins - performance tip-up, please?
Previous Message Tom Lane 2006-05-01 23:35:02 Re: Cluster vs. non-cluster query planning