Re: The science of optimization in practical terms?

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: jd(at)commandprompt(dot)com, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The science of optimization in practical terms?
Date: 2009-02-15 18:16:32
Message-ID: Pine.GSO.4.64.0902151238070.1312@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 13 Feb 2009, Robert Haas wrote:

> Gather statistics on relation access patterns and use that to estimate
> the fraction of a relation likely to be in cache.

At one point I had a hacked background writer that collected statistics
about the contents of the buffer cache. Since it's obtaining a lock on
the buffer header anyway, it's a perfectly good place to note what
relfileid the buffer is associated with. If you set aside some fixed
amount of space to hold information about the most popular relations
(perhaps using a continuous top-k model, see
http://www.mysmu.edu/faculty/kyriakos/topk-SIGMOD06.pdf ), you can end up
with enough data to estimate how much data in shared_buffers exists for
the most cached relations in there.

In a typical recommended tuning nowadays, we can only expect that to
sample about 1/3 of the total caching happening (presuming
shared_buffers=1/4 RAM and effective_cache_size~=3/4 RAM). While in
general it's nice to think that shared_buffers has a similar makeup to
what the OS is caching, it's not hard to discover common cases where this
would not be the case. Particularly given the VACUUM/seq scan ring-buffer
improvements in 8.3, it's easy to imagine scanning a table that's
2*shared_buffers in size showing only 256KB in shared_buffers, while the
whole thing is available in the OS cache.

I had a eureka moment where I realized I could hook the buffer eviction
code to model that. Decrement the count for that relation in the main
top-k count, then have a second count that assumes the last
2*shared_buffers evicted are also still cached. That would accurately
model the ring-buffer case and improve the quality of the model in
general. Updating those stats on every eviction would add some overhead,
but if the background writer is doing enough of them for you that should
at least be asynchronous from when most backends are blocked waiting for
an eviction.

And that's as far as I got before I had to return to real work again.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-02-15 19:32:02 Re: The science of optimization in practical terms?
Previous Message Tom Lane 2009-02-15 17:21:54 Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.