random_page_cost vs seq_page_cost

From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: random_page_cost vs seq_page_cost
Date: 2012-01-05 10:04:05
Message-ID: 20120105100405.GJ15490@ldn-qws-004.delacy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello list,

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.

Emails in this mailing lists archive seem to indicate that
1.0 vs 3.0 - 4.0 are appropriate values on modern hardware.

Which surprised me a bit as I had thought that on actual
harddrives (ignoring SSDs) random_page_cost is higher.
I guess that the number tries to reflect caching of the
relevant pages in memory and modern hardware you have
more of that?

Anyhow it would be great to have a scientific way of
setting those numbers.

Background:

We have recently upgrade two of our biggest postgres databases
to new hardware and minor version number bump (8.4.5 -> 8.4.9).

We are experiencing a big performance regression in some queries.
In those cases the planner seems to choose a nested loop index
scan instead of hashing the index once and then joining.

The new hardware was optimized for seq scans and does those
very fast.

We are not sure if the database used to choose differently
before the move to the new hardware and the hardware is
performing worse for random seeks. Or if the planner is
now making different choices.

As a counter measure we are experimenting with
enable_nestloop = off
random_page_cost = 20 (instead of the previous 4).

It is worth noting that for many small tables the nestloop
is indeed marginally faster (in the doesn't really matter
because both cases are fast enough case). But the regression
for the big tables (big in the sense of index just fits into
memory but in practice might not because there other frequently
accessed big tables) is a show stopper.

For some of those tables we have also have recently (as part
of the move) clustered for the first time in ages and it was
speculated that that might have changed statistics (such
as correlation) and increased the attractiveness of the
index scan to the planner.

Another thing that I have thought before might be provide
some enlightenment would be a
explain log select ...

That would show all the sub plans considered and why they
were discarded or something approximating this.

Thanks in advance for any reply and sorry that this email
turned out to be rather long stream of consciousness dump.

Bene

------ relevant parts of our postgresql.conf -----------

shared_buffers = 12GB # min 128kB
# (change requires restart)
temp_buffers = 512MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 192MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
#max_stack_depth = 2MB # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 100ms # 0-100 milliseconds
vacuum_cost_page_hit = 1 # 0-10000 credits
vacuum_cost_page_miss = 10 # 0-10000 credits
vacuum_cost_page_dirty = 20 # 0-10000 credits
vacuum_cost_limit = 7500 # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round

# - Asynchronous Behavior -

effective_io_concurrency = 40 # 1-1000. 0 disables prefetching

# WRITE AHEAD LOG

fsync = on # turns forced synchronization on or off
synchronous_commit = off # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync
full_page_writes = on # recover from partial page writes
wal_buffers = 16MB # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds

commit_delay = 1000 # range 0-100000, in microseconds
commit_siblings = 5 # range 1-1000

# - Checkpoints -

checkpoint_segments = 128 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
checkpoint_warning = 30s # 0 disables

# - Archiving -

archive_mode = off # allows archiving to be done

# QUERY TUNING

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 20.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 32GB # dpowers: set to 2/3 of available ram

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

default_statistics_target = 1000 # range 1-10000
#constraint_exclusion = partition # on, off, or partition
cursor_tuple_fraction = 1.0 # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit
# JOIN clauses

Machine:
OS: linux 2.6.32-71.29.1
CPU 12 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz Cache size: 12288 KB
RAM: 47.12 GB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-01-05 10:58:09 Re: WIP: Join push-down for foreign tables
Previous Message pratikchirania 2012-01-05 06:30:40 Re: pgstat wait timeout