Re: Move unused buffers to freelist

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move unused buffers to freelist
Date: 2013-06-27 13:50:32
Message-ID: CA+TgmoZK_-riTdfY=9e4k8WyHNMQZH-WHLw6Uf=B7gT18tBeEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Contention wise I aggree. What I have seen is that we have a huge
> amount of cacheline bouncing around the buffer header spinlocks.

How did you measure that?

> I have previously added some adhoc instrumentation that printed the
> amount of buffers that were required (by other backends) during a
> bgwriter cycle and the amount of buffers that the buffer manager could
> actually write out.

I think you can see how many are needed from buffers_alloc. No?

> I don't think I actually found any workload where
> the bgwriter actually wroute out a relevant percentage of the necessary
> pages.

Check.

> Problems with the current code:
>
> * doesn't manipulate the usage_count and never does anything to used
> pages. Which means it will just about never find a victim buffer in a
> busy database.

Right. I was thinking that was part of this patch, but it isn't. I
think we should definitely add that. In other words, the background
writer's job should be to run the clock sweep and add buffers to the
free list. I think we should also split the lock: a spinlock for the
freelist, and an lwlock for the clock sweep.

> * by far not aggressive enough, touches only a few buffers ahead of the
> clock sweep.

Check. Fixing this might be a separate patch, but then again maybe
not. The changes we're talking about here provide a natural feedback
mechanism: if we observe that the freelist is empty (or less than some
length, like 32 buffers?) set the background writer's latch, because
we know it's not keeping up.

> * does not advance the clock sweep, so the individual backends will
> touch the same buffers again and transfer all the buffer spinlock
> cacheline around

Yes, I think that should be fixed as part of this patch too. It's
obviously connected to the point about usage counts.

> * The adaption logic it has makes it so slow to adapt that it takes
> several minutes to adapt.

Yeah. I don't know if fixing that will fall naturally out of these
other changes or not, but I think it's a second-order concern in any
event.

> There's another thing we could do to noticeably improve scalability of
> buffer acquiration. Currently we do a huge amount of work under the
> freelist lock.
> In StrategyGetBuffer:
> LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
> ...
> // check freelist, will usually be empty
> ...
> for (;;)
> {
> buf = &BufferDescriptors[StrategyControl->nextVictimBuffer];
>
> ++StrategyControl->nextVictimBuffer;
>
> LockBufHdr(buf);
> if (buf->refcount == 0)
> {
> if (buf->usage_count > 0)
> {
> buf->usage_count--;
> }
> else
> {
> /* Found a usable buffer */
> if (strategy != NULL)
> AddBufferToRing(strategy, buf);
> return buf;
> }
> }
> UnlockBufHdr(buf);
> }
>
> So, we perform the entire clock sweep until we found a single buffer we
> can use inside a *global* lock. At times we need to iterate over the
> whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all
> the usage counts enough (if the database is busy it can take even
> longer...).
> In a busy database where usually all the usagecounts are high the next
> backend will touch a lot of those buffers again which causes massive
> cache eviction & bouncing.
>
> It seems far more sensible to only protect the clock sweep's
> nextVictimBuffer with a spinlock. With some care all the rest can happen
> without any global interlock.

That's a lot more spinlock acquire/release cycles, but it might work
out to a win anyway. Or it might lead to the system suffering a
horrible spinlock-induced death spiral on eviction-heavy workloads.

> I think even after fixing this - which we definitely should do - having
> a sensible/more aggressive bgwriter moving pages onto the freelist makes
> sense because then backends then don't need to deal with dirty pages.

Or scanning to find evictable pages.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-27 13:51:07 Re: pg_filedump 9.3: checksums (and a few other fixes)
Previous Message Tom Lane 2013-06-27 13:48:25 Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn