Re: Scaling shared buffer eviction

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-23 23:42:53
Message-ID: 20140923234253.GJ2521@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-09-23 19:21:10 -0400, Robert Haas wrote:
> On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I think it might be possible to construct some cases where the spinlock
> > performs worse than the lwlock. But I think those will be clearly in the
> > minority. And at least some of those will be fixed by bgwriter.

Err, this should read bgreclaimer, not bgwriter.

> > As an
> > example consider a single backend COPY ... FROM of large files with a
> > relatively large s_b. That's a seriously bad workload for the current
> > victim buffer selection because frequently most of the needs to be
> > searched for a buffer with usagecount 0. And this patch will double the
> > amount of atomic ops during that.
>
> It will actually be far worse than that, because we'll acquire and
> release the spinlock for every buffer over which we advance the clock
> sweep, instead of just once for the whole thing.

I said double, because we already acquire the buffer header's spinlock
every tick.

> That's reason to hope that a smart bgreclaimer process may be a good
> improvement on top of this.

Right. That's what I was trying to say with bgreclaimer above.

> But I think it's right to view that as something we need to test vs.
> the baseline established by this patch.

Agreed. I think the possible downsides are at the very fringe of already
bad cases. That's why I agreed that you should go ahead.

> > Let me try to quantify that.
>
> Please do.

I've managed to find a ~1.5% performance regression. But the setup was
plain absurd. COPY ... FROM /tmp/... BINARY; of large bytea datums into
a fillfactor 10 table with the column set to PLAIN storage. With the
resulting table size chosen so it's considerably bigger than s_b, but
smaller than the dirty writeback limit of the kernel.

That's perfectly reasonable.

I can think of a couple other cases, but they're all similarly absurd.

> >> What's interesting is that I can't see in the perf output any real
> >> sign that the buffer mapping locks are slowing things down, but they
> >> clearly are, because when I take this patch and also boost
> >> NUM_BUFFER_PARTITIONS to 128, the performance goes up:
> >
> > What did events did you profile?
>
> cs.

Ah. My guess is that most of the time will probably actually be spent in
the lwlock's spinlock, not the the lwlock putting itself to sleep.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-09-23 23:56:24 Re: [HACKERS] “Core” function in Postgres
Previous Message Michael Paquier 2014-09-23 23:35:56 Re: BRIN indexes - TRAP: BadArgument