Re: Clock sweep not caching enough B-Tree leaf pages?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2014-04-16 09:18:52
Message-ID: 20140416091852.GA16358@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-04-16 01:58:23 -0700, Peter Geoghegan wrote:
> On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I think this is unfortunately completely out of question. For one a
> > gettimeofday() for every uffer pin will become a significant performance
> > problem. Even the computation of the xact/stm start/stop timestamps
> > shows up pretty heavily in profiles today - and they are far less
> > frequent than buffer pins. And that's on x86 linux, where gettimeofday()
> > is implemented as something more lightweight than a full syscall.
>
> Come on, Andres. Of course exactly what I've done here is completely
> out of the question as a patch that we can go and commit right now.
> I've numerous caveats about bloating the buffer descriptors, and about
> it being a proof of concept. I'm pretty sure we can come up with a
> scheme to significantly cut down on the number of gettimeofday() calls
> if it comes down to it. In any case, I'm interested in advancing our
> understanding of the problem right now. Let's leave the minutiae to
> one side for the time being.

*I* don't think any scheme that involves measuring the time around
buffer pins is going to be acceptable. It's better than I say that now
rather than when you've invested significant time into the approach, no?

> > The other significant problem I see with this is that its not adaptive
> > to the actual throughput of buffers in s_b. In many cases there's
> > hundreds of clock cycles through shared buffers in 3 seconds. By only
> > increasing the usagecount that often you've destroyed the little
> > semblance to a working LRU there is right now.
>
> If a usage_count can get to BM_MAX_USAGE_COUNT from its initial
> allocation within an instant, that's bad. It's that simple. Consider
> all the ways in which that can happen almost by accident.

Yes, I agree that that's a problem. It immediately going down to zero is
a problem as well though. And that's what will happen in many scenarios,
because you have time limits on increasing the usagecount, but not when
decreasing.

> > It also wouldn't work well for situations with a fast changing
> > workload >> s_b. If you have frequent queries that take a second or so
> > and access some data repeatedly (index nodes or whatnot) only increasing
> > the usagecount once will mean they'll continually fall back to disk access.
>
> No, it shouldn't, because there is a notion of buffers getting a fair
> chance to prove themselves.

If you have a workload with > (BM_MAX_USAGE_COUNT + 1) clock
cycles/second, how does *any* buffer has a chance to prove itself?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2014-04-16 09:27:41 Dynamic Background Workers and clean exit
Previous Message Andreas 'ads' Scherbaum 2014-04-16 09:12:11 Re: Patch: iff -> if