Re: Scaling shared buffer eviction

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-22 05:13:47
Message-ID: 541FB00B.4050703@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/16/14, 8:18 AM, Amit Kapila wrote:
> I think the main reason for slight difference is that
> when the size of shared buffers is almost same as data size, the number
> of buffers it needs from clock sweep are very less, as an example in first
> case (when size of shared buffers is 12286MB), it actually needs at most
> 256 additional buffers (2MB) via clock sweep, where as bgreclaimer
> will put 2000 (high water mark) additional buffers (0.5% of shared buffers
> is greater than 2000 ) in free list, so bgreclaimer does some extra work
> when it is not required
This is exactly what I was warning about, as the sort of lesson learned
from the last round of such tuning. There are going to be spots where
trying to tune the code to be aggressive on the hard cases will work
great. But you need to make that dynamic to some degree, such that the
code doesn't waste a lot of time sweeping buffers when the demand for
them is actually weak. That will make all sorts of cases that look like
this slower.

We should be able to tell these apart if there's enough instrumentation
and solid logic inside of the program itself though. The 8.3 era BGW
coped with a lot of these issues using a particular style of moving
average with fast reaction time, plus instrumenting the buffer
allocation rate as accurately as it could. So before getting into
high/low water note questions, are you comfortable that there's a clear,
accurate number that measures the activity level that's important here?
And have you considered ways it might be averaging over time or have a
history that's analyzed? The exact fast approach / slow decay weighted
moving average approach of the 8.3 BGW, the thing that tried to smooth
the erratic data set possible here, was a pretty critical part of
getting itself auto-tuning to workload size. It ended up being much
more important than the work of setting the arbitrary watermark levels.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-09-22 05:25:27 Re: pg_receivexlog and replication slots
Previous Message Gregory Smith 2014-09-22 05:00:45 Re: Per table autovacuum vacuum cost limit behaviour strange