Quick Links

Re: reducing random_page_cost from 4 to 2 to force index scan

Lists:	pgsql-performance

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<sokann(at)gmail(dot)com>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-27 12:40:58
Message-ID:	4DB7C88A020000250003CF10@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Sok Ann Yap wrote:
> Kevin Grittner wrote:

>> Please show us your overall configuration and give a description
>> of the hardware (how many of what kind of cores, how much RAM,
>> what sort of storage system).

> Here's the configuration (this is just a low end laptop):

> version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
> pie-0.4.5) 4.5.2, 64-bit
> checkpoint_segments | 16
> default_statistics_target | 10000

Usually overkill. If this didn't help, you should probably change it
back.

> effective_cache_size | 512MB
> lc_collate | en_US.UTF-8
> lc_ctype | en_US.UTF-8
> listen_addresses | *
> log_destination | syslog
> log_min_duration_statement | 0
> maintenance_work_mem | 256MB
> max_connections | 100

You probably don't need this many connections.

> max_stack_depth | 2MB
> port | 5432
> random_page_cost | 4
> server_encoding | UTF8
> shared_buffers | 256MB
> silent_mode | on
> TimeZone | Asia/Kuala_Lumpur
> wal_buffers | 1MB
> work_mem | 32MB
> (20 rows)

It's hard to recommend other changes without knowing the RAM on the
system. How many of what kind of CPUs would help, too.

> The thing is, the query I posted was fairly simple (I think), and
> PostgreSQL should be able to choose the 3000+ times faster index
> scan with the default random_page_cost of 4.

It picks the plan with the lowest estimated cost. If it's not
picking the best plan, that's usually an indication that you need to
adjust cost factors so that estimates better model the actual costs.

> If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
> is random_page_cost = 4 good for?

It's good for large databases with a lot of physical disk I/O. In
fact, in some of those cases, it needs to be higher. In your test,
the numbers indicate that everything was cached in RAM. That makes
the effective cost very low.

Also, the odds are that you have more total cache space between the
shared_buffers and the OS cache than the effective_cache_size
setting, so the optimizer doesn't expect the number of cache hits
you're getting on index usage.

-Kevin

From:	Sok Ann Yap <sokann(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-27 22:34:28
Message-ID:	BANLkTi=NFNTZ8n4zB0L+TLdurnSnnpx9Pw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Sok Ann Yap wrote:
>> Kevin Grittner wrote:
>
>>> Please show us your overall configuration and give a description
>>> of the hardware (how many of what kind of cores, how much RAM,
>>> what sort of storage system).
>
>> Here's the configuration (this is just a low end laptop):
>
>> version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
>> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
>> pie-0.4.5) 4.5.2, 64-bit
>> checkpoint_segments | 16
>> default_statistics_target | 10000
>
> Usually overkill. If this didn't help, you should probably change it
> back.
>
>> effective_cache_size | 512MB
>> lc_collate | en_US.UTF-8
>> lc_ctype | en_US.UTF-8
>> listen_addresses | *
>> log_destination | syslog
>> log_min_duration_statement | 0
>> maintenance_work_mem | 256MB
>> max_connections | 100
>
> You probably don't need this many connections.
>
>> max_stack_depth | 2MB
>> port | 5432
>> random_page_cost | 4
>> server_encoding | UTF8
>> shared_buffers | 256MB
>> silent_mode | on
>> TimeZone | Asia/Kuala_Lumpur
>> wal_buffers | 1MB
>> work_mem | 32MB
>> (20 rows)
>
> It's hard to recommend other changes without knowing the RAM on the
> system. How many of what kind of CPUs would help, too.
>
>> The thing is, the query I posted was fairly simple (I think), and
>> PostgreSQL should be able to choose the 3000+ times faster index
>> scan with the default random_page_cost of 4.
>
> It picks the plan with the lowest estimated cost. If it's not
> picking the best plan, that's usually an indication that you need to
> adjust cost factors so that estimates better model the actual costs.
>
>> If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
>> is random_page_cost = 4 good for?
>
> It's good for large databases with a lot of physical disk I/O. In
> fact, in some of those cases, it needs to be higher. In your test,
> the numbers indicate that everything was cached in RAM. That makes
> the effective cost very low.
>
> Also, the odds are that you have more total cache space between the
> shared_buffers and the OS cache than the effective_cache_size
> setting, so the optimizer doesn't expect the number of cache hits
> you're getting on index usage.
>
> -Kevin
>

Thanks for the tips and explanation. I wrongly assumed the
random_page_cost value is independent from caching.

Now, let's go back to the original query:

SELECT
salutations.id,
salutations.name,
EXISTS (
SELECT 1
FROM contacts
WHERE salutations.id = contacts.salutation_id
) AS in_use
FROM salutations

If I split up the query, i.e. running this once:

SELECT
salutations.id,
salutations.name
FROM salutations

and then running this 44 times, once for each row:

SELECT
EXISTS (
SELECT 1
FROM contacts
WHERE contacts.salutation_id = ?
) AS in_use

I can see that PostgreSQL will smartly pick the best plan, i.e. for
common salutations (Madam, Ms, etc), it will do sequential scan, while
for salutations that are rarely used or not used at all, it will do
index scan.

Anyway, the overhead of spawning 44 extra queries means that it is
still better off for me to stick with the original query and tune
PostgreSQL to choose index scan.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Sok Ann Yap" <sokann(at)gmail(dot)com>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-27 23:23:36
Message-ID:	4DB85F28020000250003CFA1@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Sok Ann Yap <sokann(at)gmail(dot)com> wrote:

> Anyway, the overhead of spawning 44 extra queries means that it is
> still better off for me to stick with the original query and tune
> PostgreSQL to choose index scan.

Maybe, but what is *best* for you is to tune PostgreSQL so that your
costs are accurately modeled, at which point it will automatically
pick the best plan for most or all of your queries without you
needing to worry about it.

If you set your effective_cache_size to the sum of shared_buffers
and what your OS reports as cache after you've been running a while,
that will help the optimizer know what size index fits in RAM, and
will tend to encourage index use. If the active portion of your
data is heavily cached, you might want to set random_page_cost and
seq_page_cost to the same value, and make that value somewhere in
the 0.1 to 0.05 range. If you have moderate caching, using 1 and 2
can be good.

If you're still not getting reasonable plans, please post again with
more information about your hardware along with the query and its
EXPLAIN ANALYZE output.

-Kevin

From:	Sok Ann Yap <sokann(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-28 00:19:01
Message-ID:	BANLkTi=u9HeA3odW7Vs8z_bBWacZrqVF0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Sok Ann Yap <sokann(at)gmail(dot)com> wrote:
>
>> Anyway, the overhead of spawning 44 extra queries means that it is
>> still better off for me to stick with the original query and tune
>> PostgreSQL to choose index scan.
>
> Maybe, but what is *best* for you is to tune PostgreSQL so that your
> costs are accurately modeled, at which point it will automatically
> pick the best plan for most or all of your queries without you
> needing to worry about it.
>
> If you set your effective_cache_size to the sum of shared_buffers
> and what your OS reports as cache after you've been running a while,
> that will help the optimizer know what size index fits in RAM, and
> will tend to encourage index use. If the active portion of your
> data is heavily cached, you might want to set random_page_cost and
> seq_page_cost to the same value, and make that value somewhere in
> the 0.1 to 0.05 range. If you have moderate caching, using 1 and 2
> can be good.
>
> If you're still not getting reasonable plans, please post again with
> more information about your hardware along with the query and its
> EXPLAIN ANALYZE output.
>
> -Kevin
>

I understand the need to tune PostgreSQL properly for my use case.
What I am curious about is, for the data set I have, under what
circumstances (hardware/workload/cache status/etc) would a sequential
scan really be faster than an index scan for that particular query?

To simulate a scenario when nothing is cached, I stopped PostgreSQL,
dropped all system cache (sync; echo 3 > /proc/sys/vm/drop_caches),
restarted PostgreSQL, and ran the query. A sequential scan run took
13.70 seconds, while an index scan run took 0.34 seconds, which is
still 40 times faster.

Also, I tried increasing effective_cache_size from 512MB to 3GB (the
database size is 2+GB), and it still favor sequential scan. The
estimated costs did not change at all.

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Sok Ann Yap <sokann(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-28 11:32:46
Message-ID:	BANLkTimobiDzO89Dw5bnn8KbcorK-Y9BPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap <sokann(at)gmail(dot)com> wrote:

> On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
>
>
> I understand the need to tune PostgreSQL properly for my use case.
> What I am curious about is, for the data set I have, under what
> circumstances (hardware/workload/cache status/etc) would a sequential
> scan really be faster than an index scan for that particular query?
>

Possibly none on your hardware - if the index is likely to be in memory
along with the actual table rows. In which case, the cost for index scan
(random page cost) should be made much closer to the cost for sequential
access. It looks like the planner must use the same strategy on each
iteration of the loop - it can't do index scan for some values and
sequential scan for others, so it must be computing the cost as
sequential_cost * (number of entries(44)) versus random_cost * (number of
entries). If random page cost is unreasonably high, it's not hard to see
how it could wind up looking more expensive to the planner, causing it to
choose the sequential scan for each loop iteration. If it were able to
change strategy on each iteration, it would be able to accurately assess
cost for each iteration and choose the correct strategy for that value. As
soon as you set the costs closer to actual cost for your system, postgres
does make the correct choice. If there weren't enough memory that postgres
could be 'sure' that the index would remain in cache at least for the
duration of all 44 iterations due to high workload, it is easy to see how
the index scan might become significantly more expensive than the sequential
scan, since the index scan must also load the referenced page from the table
- postgres cannot get values directly from the index.

> To simulate a scenario when nothing is cached, I stopped PostgreSQL,
> dropped all system cache (sync; echo 3 > /proc/sys/vm/drop_caches),
> restarted PostgreSQL, and ran the query. A sequential scan run took
> 13.70 seconds, while an index scan run took 0.34 seconds, which is
> still 40 times faster.
>
> Also, I tried increasing effective_cache_size from 512MB to 3GB (the
> database size is 2+GB), and it still favor sequential scan. The
> estimated costs did not change at all.
>

Greg Smith had this to say in a another thread on this same subject:

effective_cache_size probably doesn't do as much as you suspect. It is used
for one of the computations for whether an index is small enough that it can
likely be read into memory efficiently. It has no impact on caching
decisions outside of that.

This is why the cost for random page access must be fairly accurate.Even if
the index is in memory, *it still needs to access the page of data in the
table referenced by the index*, which is why the cost of random access must
be accurate. That cost is a factor of both the performance of your storage
infrastructure and the cache hit rate and can't really be computed by the
database on the fly. You seem to be looking at the data which exposes the
fact that random page access is fast and wondering why postgres isn't doing
the right thing when postgres isn't doing the right thing precisely because
it doesn't know that random page access is fast. Since you don't have
particularly fast storage infrastructure, this is likely a function of cache
hit rate, so you must factor in eventual load on the db when setting this
value. While it may be fast in a lightly loaded test environment, those
random page accesses will get much more expensive when competing with other
concurrent disk access.

There's another thread currently active on this list (it started on April
12) with subject "Performance" which contains this explanation of what is
going on and why you need to tune these parameters independently of
effective_cache_size:

When the planner decides what execution plan to use,
it computes a 'virtual cost' for different plans and then chooses the
cheapest one.

Decreasing 'random_page_cost' decreases the expected cost of plans
involving index scans, so that at a certain point it seems cheaper than
a plan using sequential scans etc.

You can see this when using EXPLAIN - do it with the original cost
values, then change the values (for that session only) and do the
EXPLAIN only. You'll see how the execution plan suddenly changes and
starts to use index scans.

The problem with random I/O is that it's usually much more expensive
than sequential I/O as the drives need to seek etc. The only case when
random I/O is just as cheap as sequential I/O is when all the data is
cached in memory, because within RAM there's no difference between
random and sequential access (right, that's why it's called Random
Access Memory).

So in the previous post setting both random_page_cost and seq_page_cost
to the same value makes sense, because when the whole database fits into
the memory, there's no difference and index scans are favorable.

In this case (the database is much bigger than the available RAM) this
no longer holds - index scans hit the drives, resulting in a lot of
seeks etc. So it's a serious performance killer ...

Note: I'm not a postgres developer, so I don't often contribute to these
threads for fear of communicating misinformation. I'm sure someone will
speak up if I got it wrong, but I happened to read that other thread this
afternoon and I wasn't sure anyone else would bring it up, so I chimed in.
Take my input with a grain of salt until confirmed by someone with more
knowledge of postgres internals than I possess.

--sam

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Sok Ann Yap <sokann(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: reducing random_page_cost from 4 to 2 to force index scan
Date:	2011-04-28 15:56:10
Message-ID:	BANLkTi=hjJ0MiV5WsOTwUfcwK3AGiSmh2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap <sokann(at)gmail(dot)com> wrote:
>
> I understand the need to tune PostgreSQL properly for my use case.
> What I am curious about is, for the data set I have, under what
> circumstances (hardware/workload/cache status/etc) would a sequential
> scan really be faster than an index scan for that particular query?

The sequential scan on contacts can be terminated as soon as the first
matching row is found. If each block of the contacts table contains
one example of each salutation, then the inner sequential scan will
always be very short, and faster than an index scan.

I can engineer this to be the case by populating the table like this:

insert into contacts select (generate_series%44+1)::int from
generate_series (1,1000000);

Here I get the seq scan being 2.6ms while the index scan is 5.6ms.

Predicting how far the inner scan needs to go would be quite
difficult, and I don't know how the system will do it.

However, when I create and populate simple tables based on your
description, I get the index scan being the lower estimated cost. So
the tables I built are not sufficient to study the matter in detail.

Cheers,

Jeff