Re: Scaling shared buffer eviction

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-22 06:55:15
Message-ID: CAA4eK1KVMCKPVKkQDcJAw07w1yum_NHggq4hWVT5dR7iwRzu5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 22, 2014 at 10:43 AM, Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
wrote:

> On 9/16/14, 8:18 AM, Amit Kapila wrote:
>
>> I think the main reason for slight difference is that
>> when the size of shared buffers is almost same as data size, the number
>> of buffers it needs from clock sweep are very less, as an example in first
>> case (when size of shared buffers is 12286MB), it actually needs at most
>> 256 additional buffers (2MB) via clock sweep, where as bgreclaimer
>> will put 2000 (high water mark) additional buffers (0.5% of shared buffers
>> is greater than 2000 ) in free list, so bgreclaimer does some extra work
>> when it is not required
>>
> This is exactly what I was warning about, as the sort of lesson learned
> from the last round of such tuning. There are going to be spots where
> trying to tune the code to be aggressive on the hard cases will work
> great. But you need to make that dynamic to some degree, such that the
> code doesn't waste a lot of time sweeping buffers when the demand for them
> is actually weak. That will make all sorts of cases that look like this
> slower.
>

To verify whether above can lead to any kind of regression, I have
checked the cases (workload is 0.05 or 0.1 percent larger than shared
buffers) where we need few extra buffers and bgreclaimer might put
some additional buffers and it turns out that in those cases also, there
is a win especially at high concurrency and results of the same are posted
upthread
(
http://www.postgresql.org/message-id/CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=VOOLUds-A@mail.gmail.com).

> We should be able to tell these apart if there's enough instrumentation
> and solid logic inside of the program itself though. The 8.3 era BGW coped
> with a lot of these issues using a particular style of moving average with
> fast reaction time, plus instrumenting the buffer allocation rate as
> accurately as it could. So before getting into high/low water note
> questions, are you comfortable that there's a clear, accurate number that
> measures the activity level that's important here?

Very Good Question. This was exactly the thing which was
missing in my initial versions (about 2 years back when I tried to
solve this problem) but based on Robert's and Andres's feedback
I realized that we need an accurate number to measure the activity
level (in this case it is consumption of buffers from freelist), so
I have introduced the logic to calculate the same (it is stored in new
variable numFreeListBuffers in BufferStrategyControl structure).

> And have you considered ways it might be averaging over time or have a
> history that's analyzed?

The current logic of bgreclaimer is such that even if it does
some extra activity (extra is very much controlled) in one cycle,
it will not start another cycle unless backends consume all the
buffers that were made available by bgreclaimer in one cycle.
I think the algorithm designed for bgreclaimer automatically
averages out based on activity. Do you see any cases where it
will not do so?

> The exact fast approach / slow decay weighted moving average approach of
> the 8.3 BGW, the thing that tried to smooth the erratic data set possible
> here, was a pretty critical part of getting itself auto-tuning to workload
> size. It ended up being much more important than the work of setting the
> arbitrary watermark levels.
>
>
Agreed, but the logic with which bgwriter works is pretty different
and thats why it needs different kind of logic to handle auto-tuning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-09-22 07:05:17 Re: Index scan optimization
Previous Message Michael Paquier 2014-09-22 06:46:14 Documentation fix for pg_recvlogical's --create mode