Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-06-22 15:09:13
Message-ID: 20140622150913.GJ30721@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-06-22 12:38:04 +0100, Simon Riggs wrote:
> On 9 April 2014 15:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
> >>> An orthogonal issue I noted is that we never check for overflow in the ref
> >>> count itself. While I understand overflowing int32 counter will take a
> >>> large number of pins on the same buffer, it can still happen in the worst
> >>> case, no ? Or is there a theoretical limit on the number of pins on the
> >>> same buffer by a single backend ?
> >
> >> I think we'll die much earlier, because the resource owner array keeping
> >> track of buffer pins will be larger than 1GB.
> >
> > The number of pins is bounded, more or less, by the number of scan nodes
> > in your query plan. You'll have run out of memory trying to plan the
> > query, assuming you live that long.
>
> ISTM that there is a strong possibility that the last buffer pinned
> will be the next buffer to be unpinned. We can use that to optimise
> this.

> If we store the last 8 buffers pinned in the fast array then we will
> be very likely to hit the right buffer just by scanning the array.
>
> So if we treat the fast array as a circular LRU, we get
> * pinning a new buffer when array has an empty slot is O(1)
> * pinning a new buffer when array is full causes us to move the LRU
> into the hash table and then use that element
> * unpinning a buffer will most often be O(1), which then leaves an
> empty slot for next pin
>
> Doing it that way means all usage is O(1) apart from when we use >8
> pins concurrently and that usage does not follow the regular pattern.

Even that case is O(1) in the average case since insertion into a
hashtable is O(1) on average...

I've started working on a patch that pretty much works like that. It
doesn't move things around in the array, because that seemed to perform
badly. That seems to make sense, because it'd require moving entries in
the relatively common case of two pages being pinned.
It moves one array entry (chosen by [someint++ % NUM_ENTRIES] and moves
it to the hashtable and puts the new item in the now free slot. Same
happens if a lookup hits an entry from the hashtable. It moves one
entry from the array into the hashtable and puts the entry from the
hashtable in the free slot.
That seems to work nicely, but needs some cleanup. And benchmarks.

> We need to do something about this. We have complaints (via Heikki)
> that we are using too much memory in idle backends and small configs,
> plus we know we are using too much memory in larger servers. Reducing
> the memory usage here will reduce CPU L2 cache churn as well as
> increasing available RAM.

Yea, the buffer pin array currently is one of the biggest sources of
cache misses... In contrast to things like the buffer descriptors it's
not even shared between concurrent processes, so it's more wasteful,
even if small.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-06-22 15:11:17 Re: idle_in_transaction_timeout
Previous Message Pavel Stehule 2014-06-22 15:04:51 Re: [Fwd: Re: proposal: new long psql parameter --on-error-stop]