Proposal for new SET variables for optimizer costs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposal for new SET variables for optimizer costs
Date: 2000-02-05 21:29:26
Message-ID: 14601.949786166@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am about to implement some changes to the planner/optimizer's cost
model, following up to the thread on pghackers beginning on 20 Jan.
The main conclusion of that thread was that we needed to charge more for
a page fetched nonsequentially than for a page fetched sequentially.
After further investigation I have concluded that it is also appropriate
to include explicit modeling of the cost of evaluation of WHERE clauses.
For example, using the regression database and a query like

select * from tenk1 where
(unique1 = 1 and unique2 = 101) or
(unique1 = 2 and unique2 = 102) or
(unique1 = 3 and unique2 = 103) or
... 100 OR clauses ...
(unique1 = 100 and unique2 = 200);

(which is not too implausible for certain automatic query generators),
I observe that a sequential scan takes about 6 seconds, vs. less than
a second for a similar query with only 10 clauses. That says that the
cost of evaluating a WHERE clause this large is far from negligible.
The optimizer needs to account for this because different query plans
can have a considerable impact on the number of tuples that the WHERE
clause is evaluated for --- in this example, if we use indexscans to
pull out just the tuples with the right values of 'unique1', then the
WHERE clause need only be checked at 100 tuples, not all 10000.

I believe it would be reasonable to charge a certain amount per operator
or function appearing in the WHERE clause in order to account for this
effect. (Currently I see no need to model the cost of evaluating the
targetlist expressions. The same expressions should get evaluated for
the same tuples no matter what query plan the optimizer picks, so we
might as well just leave that cost out of our comparisons.)

Also, as was previously mentioned on pghackers, I would like to add SET
variables to control enabling/disabling of particular query plan types,
so that different plans can be checked with less hassle than restarting
psql with a new PGOPTIONS setting.

This all leads to the following proposal for redoing the optimizer
plan cost SET variables. The variables proposed below would replace
COST_HEAP and COST_INDEX, which are poorly named IMHO and are definitely
very misleadingly documented at present.

(Note that all costs will still be referenced to the cost of a disk page
fetch. We will take 1.0 as the cost of a sequential page fetch.)

SET variable name Internal variable Proposed default

RANDOM_PAGE_COST random_page_cost 4.0

Cost of fetching a disk block nonsequentially (as a multiple of the cost
of a sequential block fetch).

CPU_TUPLE_COST cpu_tuple_cost 0.01

Cost of CPU time per tuple processed within a query (as a fraction of
the cost of a sequential disk block fetch). This renames the existing
SET variable COST_HEAP (cpu_page_weight); but the default value is
smaller than it used to be, since WHERE clause evaluation will now be
accounted for separately.

CPU_INDEX_TUPLE_COST cpu_index_tuple_cost 0.001

Cost of CPU time per index tuple processed within a query (as a fraction
of the cost of a sequential disk block fetch). This renames the
existing SET variable COST_INDEX (cpu_index_page_weight); but the
default value is much smaller than it used to be, since the operator
evaluation cost will account for the bulk of the cost of visiting an
index tuple.

CPU_OPERATOR_COST cpu_operator_cost 0.0025

Cost of CPU time per operator or function evaluated in a WHERE clause.
Note that this would apply to operators evaluated at index tuples as
well as those evaluated against heap tuples.
(The proposed default corresponds to a ratio of 5 microsec against 2
millisec for a sequential block fetch, which seems to be about right
on my workstation.)

ENABLE_SEQSCAN enable_seqscan ON

ENABLE_INDEXSCAN enable_indexscan ON

ENABLE_TIDSCAN enable_tidscan ON

ENABLE_SORT enable_sort ON

ENABLE_NESTLOOP enable_nestloop ON

ENABLE_MERGEJOIN enable_mergejoin ON

ENABLE_HASHJOIN enable_hashjoin ON

Provide access via SET to the already-existing internal optimizer
control flags.

Currently, it is possible to have COST_HEAP and COST_INDEX set
automatically during connection startup; libpq will do that if
the environment variables PGCOSTHEAP and/or PGCOSTINDEX are defined
on the client side. If we want to continue that behavior, the
environment variables for these variables would be named
PGRANDOMPAGECOST etc (remove underscores and prepend PG).
I'm not sure if we want to continue inventing client-side environment
variables, however.

Comments? Ideas for better names? Anyone object to renaming the
existing variables? (BTW, although it could be argued that this
might break existing scripts that set COST_HEAP or COST_INDEX,
I doubt that there are any ... and given the existing doco,
I doubt even more that anyone is setting appropriate values ...)

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-02-05 22:01:59 Re: [HACKERS] Proposal for new SET variables for optimizer costs
Previous Message Dmitry Samersoff 2000-02-05 21:01:09 RE: [HACKERS] Linux MANDRAKE startup startup script is broken ?