[RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

Lists: pgsql-hackers
From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-03-21 18:22:31
Message-ID: 20140321182231.GA17111@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I've been annoyed at the amount of memory used by the backend local
PrivateRefCount array for a couple of reasons:

a) The performance impact of AtEOXact_Buffers() on Assert() enabled
builds is really, really annoying.
b) On larger nodes, the L1/2/3 cache impact of randomly accessing
several megabyte big array at a high frequency is noticeable. I've
seen the access to that to be the primary (yes, really) source of
pipeline stalls.
c) On nodes with significant shared_memory the sum of the per-backend
arrays is a significant amount of memory, that could very well be
used more beneficially.

So what I have done in the attached proof of concept is to have a small
(8 currently) array of (buffer, pincount) that's searched linearly when
the refcount of a buffer is needed. When more than 8 buffers are pinned
a hashtable is used to lookup the values.

That seems to work fairly well. On the few tests I could run on my
laptop - I've done this during a flight - it's a small performance win
in all cases I could test. While saving a fair amount of memory.

Alternatively we could just get rid of the idea of tracking this per
backend, relying on tracking via resource managers...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Make-backend-local-tracking-of-buffer-pins-more-effi.patch text/x-patch 16.7 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 09:34:42
Message-ID: CA+U5nMK1SoSbWm_i024KMLn+AX_mNm7Z4+bp-93rjp0m9gk0dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 March 2014 14:22, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> That seems to work fairly well. On the few tests I could run on my
> laptop - I've done this during a flight - it's a small performance win
> in all cases I could test. While saving a fair amount of memory.

We've got to the stage now that saving this much memory is essential,
so this patch is a must-have.

The patch does all I would expect and no more, so approach and details
look good to me.

Performance? Discussed many years ago, but I suspect the micro-tuning
of those earlier patches wasn't as good as it is here.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 09:44:04
Message-ID: 20140409094404.GC4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 05:34:42 -0400, Simon Riggs wrote:
> On 21 March 2014 14:22, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > That seems to work fairly well. On the few tests I could run on my
> > laptop - I've done this during a flight - it's a small performance win
> > in all cases I could test. While saving a fair amount of memory.
>
> We've got to the stage now that saving this much memory is essential,
> so this patch is a must-have.

I think some patch like this is necessary - I am not 100% sure mine is
the one true approach here, but it certainly seems simple enough.

> Performance? Discussed many years ago, but I suspect the micro-tuning
> of those earlier patches wasn't as good as it is here.

It's a small win on small machines (my laptop, 16GB), so we need to
retest with 128GB shared_buffers or such on bigger ones. There
PrivateRefCount previously was the source of a large portion of the
cache misses...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 12:22:15
Message-ID: CA+TgmobBuOgaP438yNo5PKHGXRhOhS3MWdEF5kxJm8mC5q1O7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> We've got to the stage now that saving this much memory is essential,
> so this patch is a must-have.
>
> The patch does all I would expect and no more, so approach and details
> look good to me.
>
> Performance? Discussed many years ago, but I suspect the micro-tuning
> of those earlier patches wasn't as good as it is here.

I think this approach is practically a slam-dunk when the number of
pins is small (as it typically is). I'm less clear what happens when
we overflow from the small array into the hashtable. That certainly
seems like it could be a loss, but how do we construct such a case to
test it? A session with lots of suspended queries? Can we generate a
regression by starting a few suspended queries to use up the array
elements, and then running a scan that pins and unpins many buffers?

One idea is: if we fill up all the array elements and still need
another one, evict all the elements to the hash table and then start
refilling the array. The advantage of that over what's done here is
that the active scan will always being using an array slot rather than
repeated hash table manipulations. I guess you'd still have to probe
the hash table repeatedly, but you'd avoid entering and removing items
frequently.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 12:32:33
Message-ID: 20140409123233.GG4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 08:22:15 -0400, Robert Haas wrote:
> On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > We've got to the stage now that saving this much memory is essential,
> > so this patch is a must-have.
> >
> > The patch does all I would expect and no more, so approach and details
> > look good to me.
> >
> > Performance? Discussed many years ago, but I suspect the micro-tuning
> > of those earlier patches wasn't as good as it is here.
>
> I think this approach is practically a slam-dunk when the number of
> pins is small (as it typically is). I'm less clear what happens when
> we overflow from the small array into the hashtable. That certainly
> seems like it could be a loss, but how do we construct such a case to
> test it? A session with lots of suspended queries? Can we generate a
> regression by starting a few suspended queries to use up the array
> elements, and then running a scan that pins and unpins many buffers?

I've tried to reproduce problems around this (when I wrote this), but
it's really hard to construct cases that need more than 8 pins. I've
tested performance for those cases by simply not using the array, and
while the performance suffers a bit, it's not that bad.

> One idea is: if we fill up all the array elements and still need
> another one, evict all the elements to the hash table and then start
> refilling the array. The advantage of that over what's done here is
> that the active scan will always being using an array slot rather than
> repeated hash table manipulations. I guess you'd still have to probe
> the hash table repeatedly, but you'd avoid entering and removing items
> frequently.

We could do that, but my gut feeling is that it's not necessary. There'd
be some heuristic to avoid doing that all the time, otherwise we'd
probably regress.
I think the fact that we pin/unpin very frequently will put frequently
used pins to the array most of the time.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 12:43:29
Message-ID: CABOikdP2=0MXzy-jTo0s2tzZ_UKRhN7m1J=dtZQY=6T-wMNGng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

>
>
> I've tried to reproduce problems around this (when I wrote this), but
> it's really hard to construct cases that need more than 8 pins. I've
> tested performance for those cases by simply not using the array, and
> while the performance suffers a bit, it's not that bad.
>
>
AFAIR this was suggested before and got rejected because constructing that
worst case and proving that the approach does not perform too badly was a
challenge. Having said that, I agree its time to avoid that memory
allocation, especially with large number of backends running with large
shared buffers.

An orthogonal issue I noted is that we never check for overflow in the ref
count itself. While I understand overflowing int32 counter will take a
large number of pins on the same buffer, it can still happen in the worst
case, no ? Or is there a theoretical limit on the number of pins on the
same buffer by a single backend ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 12:49:28
Message-ID: 20140409124928.GH4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
> On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> > I've tried to reproduce problems around this (when I wrote this), but
> > it's really hard to construct cases that need more than 8 pins. I've
> > tested performance for those cases by simply not using the array, and
> > while the performance suffers a bit, it's not that bad.

> AFAIR this was suggested before and got rejected because constructing that
> worst case and proving that the approach does not perform too badly was a
> challenge. Having said that, I agree its time to avoid that memory
> allocation, especially with large number of backends running with large
> shared buffers.

Well, I've tested the worst case by making *all* pins go through the
hash table. And it didn't regress too badly, although it *was* visible
in the profile.
I've searched the archive and to my knowledge nobody has actually sent a
patch implementing this sort of schemes for pins, although there's been
talk about various ways to solve this.

> An orthogonal issue I noted is that we never check for overflow in the ref
> count itself. While I understand overflowing int32 counter will take a
> large number of pins on the same buffer, it can still happen in the worst
> case, no ? Or is there a theoretical limit on the number of pins on the
> same buffer by a single backend ?

I think we'll die much earlier, because the resource owner array keeping
track of buffer pins will be larger than 1GB.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 13:17:59
Message-ID: CA+TgmoYtXwvF93csLkqtnZTJHWEPa9Dv_W0D8kjzpNqAM9Qygg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I've tried to reproduce problems around this (when I wrote this), but
> it's really hard to construct cases that need more than 8 pins. I've
> tested performance for those cases by simply not using the array, and
> while the performance suffers a bit, it's not that bad.

Suspended queries won't do it?

Also, it would be good to quantify "not that bad". Actually, this
thread is completely lacking any actual benchmark results...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 13:38:34
Message-ID: 20140409133834.GI4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 09:17:59 -0400, Robert Haas wrote:
> On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I've tried to reproduce problems around this (when I wrote this), but
> > it's really hard to construct cases that need more than 8 pins. I've
> > tested performance for those cases by simply not using the array, and
> > while the performance suffers a bit, it's not that bad.
>
> Suspended queries won't do it?

What exactly do you mean by "suspended" queries? Defined and started
portals? Recursive query execution?

> Also, it would be good to quantify "not that bad".

The 'not bad' comes from my memory of the benchmarks I'd done after
about 12h of flying around ;).

Yes, it needs real benchmarks. Probably won't get to it the next few
days tho.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 13:59:07
Message-ID: CA+TgmoZdpx0t2WDXrBoROb1mSFvtUbghr2xbURBZmzOo1fzHfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 9, 2014 at 9:38 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-04-09 09:17:59 -0400, Robert Haas wrote:
>> On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > I've tried to reproduce problems around this (when I wrote this), but
>> > it's really hard to construct cases that need more than 8 pins. I've
>> > tested performance for those cases by simply not using the array, and
>> > while the performance suffers a bit, it's not that bad.
>>
>> Suspended queries won't do it?
>
> What exactly do you mean by "suspended" queries? Defined and started
> portals? Recursive query execution?

Open a cursor and fetch from it; leave it open while doing other things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 14:09:44
Message-ID: 20396.1397052584@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
>> An orthogonal issue I noted is that we never check for overflow in the ref
>> count itself. While I understand overflowing int32 counter will take a
>> large number of pins on the same buffer, it can still happen in the worst
>> case, no ? Or is there a theoretical limit on the number of pins on the
>> same buffer by a single backend ?

> I think we'll die much earlier, because the resource owner array keeping
> track of buffer pins will be larger than 1GB.

The number of pins is bounded, more or less, by the number of scan nodes
in your query plan. You'll have run out of memory trying to plan the
query, assuming you live that long.

The resource managers are interesting to bring up in this context.
That mechanism didn't exist when PrivateRefCount was invented.
Is there a way we could lay off the work onto the resource managers?
(I don't see one right at the moment, but I'm under-caffeinated still.)

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 14:19:22
Message-ID: 20140409141922.GJ4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 10:09:44 -0400, Tom Lane wrote:
> The resource managers are interesting to bring up in this context.
> That mechanism didn't exist when PrivateRefCount was invented.
> Is there a way we could lay off the work onto the resource managers?
> (I don't see one right at the moment, but I'm under-caffeinated still.)

Yea, that's something I've also considered, but I couldn't come up with
a performant and sensibly complicated way to do it.
There's some nasty issues with pins held by different ResourceOwners and
such, so even if we could provide sensible random access to check for
existing pins, it wouldn't be a simple thing.

It's not unreasonable to argue that we just shouldn't optimize for
several pins held by the same backend for the same and always touch the
global count. Thanks to resource managers the old reason for
PrivateRefCount, which was the need to be able cleanup remaining pins in
case of error, doesn't exist anymore.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 14:26:25
Message-ID: 20760.1397053585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> It's not unreasonable to argue that we just shouldn't optimize for
> several pins held by the same backend for the same and always touch the
> global count.

NAK. That would be a killer because of increased contention for buffer
headers. The code is full of places where a buffer's PrivateRefCount
jumps up and down a bit, for example when transferring a tuple into a
TupleTableSlot. (I said upthread that the number of pins is bounded by
the number of scan nodes, but actually it's probably some small multiple
of that --- eg a seqscan would hold its own pin on the current buffer,
and there'd be a slot or two holding the current tuple, each with its
own pin count.)

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-04-09 14:30:50
Message-ID: 20140409143050.GK4161@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-09 10:26:25 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > It's not unreasonable to argue that we just shouldn't optimize for
> > several pins held by the same backend for the same and always touch the
> > global count.
>
> NAK.

Note I didn't implement it because I wasn't too convinced either ;)

> That would be a killer because of increased contention for buffer
> headers. The code is full of places where a buffer's PrivateRefCount
> jumps up and down a bit, for example when transferring a tuple into a
> TupleTableSlot.

On the other hand in those scenarios the backend is pretty likely to
already have the cacheline locally in exclusive mode...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-06-22 11:38:04
Message-ID: CA+U5nM+1rsEa7=1k3_KQa8_dCY0PmqMNcXipEjb5H5vGm+1ZgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 April 2014 15:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
>>> An orthogonal issue I noted is that we never check for overflow in the ref
>>> count itself. While I understand overflowing int32 counter will take a
>>> large number of pins on the same buffer, it can still happen in the worst
>>> case, no ? Or is there a theoretical limit on the number of pins on the
>>> same buffer by a single backend ?
>
>> I think we'll die much earlier, because the resource owner array keeping
>> track of buffer pins will be larger than 1GB.
>
> The number of pins is bounded, more or less, by the number of scan nodes
> in your query plan. You'll have run out of memory trying to plan the
> query, assuming you live that long.

ISTM that there is a strong possibility that the last buffer pinned
will be the next buffer to be unpinned. We can use that to optimise
this.

If we store the last 8 buffers pinned in the fast array then we will
be very likely to hit the right buffer just by scanning the array.

So if we treat the fast array as a circular LRU, we get
* pinning a new buffer when array has an empty slot is O(1)
* pinning a new buffer when array is full causes us to move the LRU
into the hash table and then use that element
* unpinning a buffer will most often be O(1), which then leaves an
empty slot for next pin

Doing it that way means all usage is O(1) apart from when we use >8
pins concurrently and that usage does not follow the regular pattern.

> The resource managers are interesting to bring up in this context.
> That mechanism didn't exist when PrivateRefCount was invented.
> Is there a way we could lay off the work onto the resource managers?
> (I don't see one right at the moment, but I'm under-caffeinated still.)

Me neither. Good idea, but I think it would take a lot of refactoring
to do that.

We need to do something about this. We have complaints (via Heikki)
that we are using too much memory in idle backends and small configs,
plus we know we are using too much memory in larger servers. Reducing
the memory usage here will reduce CPU L2 cache churn as well as
increasing available RAM.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-06-22 15:09:13
Message-ID: 20140622150913.GJ30721@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-06-22 12:38:04 +0100, Simon Riggs wrote:
> On 9 April 2014 15:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
> >>> An orthogonal issue I noted is that we never check for overflow in the ref
> >>> count itself. While I understand overflowing int32 counter will take a
> >>> large number of pins on the same buffer, it can still happen in the worst
> >>> case, no ? Or is there a theoretical limit on the number of pins on the
> >>> same buffer by a single backend ?
> >
> >> I think we'll die much earlier, because the resource owner array keeping
> >> track of buffer pins will be larger than 1GB.
> >
> > The number of pins is bounded, more or less, by the number of scan nodes
> > in your query plan. You'll have run out of memory trying to plan the
> > query, assuming you live that long.
>
> ISTM that there is a strong possibility that the last buffer pinned
> will be the next buffer to be unpinned. We can use that to optimise
> this.

> If we store the last 8 buffers pinned in the fast array then we will
> be very likely to hit the right buffer just by scanning the array.
>
> So if we treat the fast array as a circular LRU, we get
> * pinning a new buffer when array has an empty slot is O(1)
> * pinning a new buffer when array is full causes us to move the LRU
> into the hash table and then use that element
> * unpinning a buffer will most often be O(1), which then leaves an
> empty slot for next pin
>
> Doing it that way means all usage is O(1) apart from when we use >8
> pins concurrently and that usage does not follow the regular pattern.

Even that case is O(1) in the average case since insertion into a
hashtable is O(1) on average...

I've started working on a patch that pretty much works like that. It
doesn't move things around in the array, because that seemed to perform
badly. That seems to make sense, because it'd require moving entries in
the relatively common case of two pages being pinned.
It moves one array entry (chosen by [someint++ % NUM_ENTRIES] and moves
it to the hashtable and puts the new item in the now free slot. Same
happens if a lookup hits an entry from the hashtable. It moves one
entry from the array into the hashtable and puts the entry from the
hashtable in the free slot.
That seems to work nicely, but needs some cleanup. And benchmarks.

> We need to do something about this. We have complaints (via Heikki)
> that we are using too much memory in idle backends and small configs,
> plus we know we are using too much memory in larger servers. Reducing
> the memory usage here will reduce CPU L2 cache churn as well as
> increasing available RAM.

Yea, the buffer pin array currently is one of the biggest sources of
cache misses... In contrast to things like the buffer descriptors it's
not even shared between concurrent processes, so it's more wasteful,
even if small.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-06-22 18:31:34
Message-ID: CA+U5nMLi8nb4-1_LM2pzLbqr_h9ZaFBBFjhyA43Hs_stb6p8xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 June 2014 16:09, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

>> So if we treat the fast array as a circular LRU, we get
>> * pinning a new buffer when array has an empty slot is O(1)
>> * pinning a new buffer when array is full causes us to move the LRU
>> into the hash table and then use that element
>> * unpinning a buffer will most often be O(1), which then leaves an
>> empty slot for next pin
>>
>> Doing it that way means all usage is O(1) apart from when we use >8
>> pins concurrently and that usage does not follow the regular pattern.
>
> Even that case is O(1) in the average case since insertion into a
> hashtable is O(1) on average...
>
> I've started working on a patch that pretty much works like that. It
> doesn't move things around in the array, because that seemed to perform
> badly. That seems to make sense, because it'd require moving entries in
> the relatively common case of two pages being pinned.
> It moves one array entry (chosen by [someint++ % NUM_ENTRIES] and moves
> it to the hashtable and puts the new item in the now free slot. Same
> happens if a lookup hits an entry from the hashtable. It moves one
> entry from the array into the hashtable and puts the entry from the
> hashtable in the free slot.

Yes, that's roughly how the SLRU code works also, so sounds good.

> That seems to work nicely, but needs some cleanup. And benchmarks.

ISTM that microbenchmarks won't reveal the beneficial L2 and RAM
effects of the patch, so I suggest we just need to do a pgbench, a
2-way nested join and a 10-way nested join with an objective of no
significant difference or better. The RAM and L2 effects are enough to
justify this, since it will help with both very small and very large
configs.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-06-22 18:40:23
Message-ID: 20140622184023.GO30721@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-06-22 19:31:34 +0100, Simon Riggs wrote:
> Yes, that's roughly how the SLRU code works also, so sounds good.

Heh. I rather see that as an argument for it sounding bad :)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-26 23:52:29
Message-ID: 20140826235229.GV21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-03-21 19:22:31 +0100, Andres Freund wrote:
> Hi,
>
> I've been annoyed at the amount of memory used by the backend local
> PrivateRefCount array for a couple of reasons:
>
> a) The performance impact of AtEOXact_Buffers() on Assert() enabled
> builds is really, really annoying.
> b) On larger nodes, the L1/2/3 cache impact of randomly accessing
> several megabyte big array at a high frequency is noticeable. I've
> seen the access to that to be the primary (yes, really) source of
> pipeline stalls.
> c) On nodes with significant shared_memory the sum of the per-backend
> arrays is a significant amount of memory, that could very well be
> used more beneficially.
>
> So what I have done in the attached proof of concept is to have a small
> (8 currently) array of (buffer, pincount) that's searched linearly when
> the refcount of a buffer is needed. When more than 8 buffers are pinned
> a hashtable is used to lookup the values.
>
> That seems to work fairly well. On the few tests I could run on my
> laptop - I've done this during a flight - it's a small performance win
> in all cases I could test. While saving a fair amount of memory.

Here's the next version of this patch. The major change is that newly
pinned/looked up buffers always go into the array, even when we're
spilling into the array. To get a free slot a preexisting entry (chosen
via PrivateRefCountArray[PrivateRefCountClock++ %
REFCOUNT_ARRAY_ENTRIES]) is displaced into the hash table. That way the
concern that frequently used buffers get 'stuck' in the hashtable while
unfrequently used are in the array is ameliorated.

The biggest concern previously were some benchmarks. I'm not entirely
sure where to get a good testcase for this that's not completely
artificial - most simpler testcases don't pin many buffers. I've played
a bit around and it's a slight performance win in pgbench read only and
mixed workloads, but not enough to get excited about alone.

When asserts are enabled, the story is different. The admittedly extreme
case of readonly pgbench scale 350, with 6GB shared_buffers and 128
clients goes from 3204.489825 39277.077448 TPS. So a) above is
definitely improved :)

The memory savings are clearly visible. During a pgbench scale 350, -cj
128 readonly run the following awk
for pid in $(pgrep -U andres postgres); do
grep VmData /proc/$pid/status;
done | \
awk 'BEGIN { sum = 0 } {sum += $2;} END { if (NR > 0) print sum/NR; else print 0;print sum;print NR}'

shows:

before:
AVG: 4626.06
TOT: 619892
NR: 134

after:
AVG: 1610.37
TOT: 217400
NR: 135

So, the patch is succeeding on c).

On it's own, in pgbench scale 350 -cj 128 -S -T10 the numbers are:
before:
166171.039778, 165488.531353, 165045.182215, 161492.094693 (excluding connections establishing)
after
175812.388869, 171600.928377, 168317.370893, 169860.008865 (excluding connections establishing)

so, a bit of a performance win.

-j 16, -c 16 -S -T10:
before:
159757.637878 161287.658276 164003.676018 160687.951017 162941.627683
after:
160628.774342 163981.064787 151239.151102 164763.851903 165219.220209

I'm too tired to do continue with write tests now, but I don't see a
reason why they should be more meaningful... We really need a test with
more complex queries I'm afraid.

Anyway, I think at this stage this needs somebody to closely look at the
code. I don't think there's going to be any really surprising
performance revelations here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Make-backend-local-tracking-of-buffer-pins-memory-ef.patch text/x-patch 19.6 KB

From: Jim Nasby <jim(at)nasby(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 01:44:32
Message-ID: 53FD3800.5000504@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/26/14, 6:52 PM, Andres Freund wrote:
> On 2014-03-21 19:22:31 +0100, Andres Freund wrote:
>> >Hi,
>> >
>> >I've been annoyed at the amount of memory used by the backend local
>> >PrivateRefCount array for a couple of reasons:
>> >
>> >a) The performance impact of AtEOXact_Buffers() on Assert() enabled
>> > builds is really, really annoying.
>> >b) On larger nodes, the L1/2/3 cache impact of randomly accessing
>> > several megabyte big array at a high frequency is noticeable. I've
>> > seen the access to that to be the primary (yes, really) source of
>> > pipeline stalls.
>> >c) On nodes with significant shared_memory the sum of the per-backend
>> > arrays is a significant amount of memory, that could very well be
>> > used more beneficially.
>> >
>> >So what I have done in the attached proof of concept is to have a small
>> >(8 currently) array of (buffer, pincount) that's searched linearly when
>> >the refcount of a buffer is needed. When more than 8 buffers are pinned
>> >a hashtable is used to lookup the values.
>> >
>> >That seems to work fairly well. On the few tests I could run on my
>> >laptop - I've done this during a flight - it's a small performance win
>> >in all cases I could test. While saving a fair amount of memory.
> Here's the next version of this patch. The major change is that newly

<snip>

> The memory savings are clearly visible. During a pgbench scale 350, -cj
> 128 readonly run the following awk
> for pid in $(pgrep -U andres postgres); do
> grep VmData/proc/$pid/status;
> done | \
> awk 'BEGIN { sum = 0 } {sum += $2;} END { if (NR > 0) print sum/NR; else print 0;print sum;print NR}'
>
> shows:
>
> before:
> AVG: 4626.06
> TOT: 619892
> NR: 134
>
> after:
> AVG: 1610.37
> TOT: 217400
> NR: 135

These results look very encouraging, especially thinking about the cache impact. It occurs to me that it'd also be nice to have some stats available on how this is performing; perhaps a dtrace probe for whenever we overflow to the hash table, and one that shows maximum usage for a statement? (Presumably that's not much extra code or overhead...)
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 02:04:03
Message-ID: CA+TgmobcNGBpL+UKza4aXcOsX-=y2PCmX=ysayKvknA0VTS=BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 26, 2014 at 7:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Here's the next version of this patch.

+ * much never requried. So we keep a small array of reference counts

Typo. But I think you could just drop the whole sentence about how
things used to be, especially since it's recapitulated elsewhere.

+#define REFCOUNT_ARRAY_ENTRIES 8 /* one full cacheline */

Obviously that's not always going to be the case. You could say
"about", or just drop the comment. Shouldn't "cache line" be two
words?

+ * refcounts are kept track of in the array, after that new array entries

s/, after that/; after that,/

+ if (!found && !create)
+ else if (!found && free != NULL)
+ else if (!found)
+ else if (found && !do_move)
+ else if (found && free != NULL)
+ else if (found)
+ Assert(false); /* unreachable */
+ return res;

There's not much point in testing found when you've already handled
the not-found cases. But I'd reorganize this whole thing like this:

if (!found) { if (!create) { return; } if (free != NULL) { stuff;
return }; stuff; return; }
if (!do_move) { return; } if (free != NULL) { stuff; return; } stuff; return;

+ * Stop tracking the refcount of the buffer ref is tracking the refcount
+ * for. Nono, there's no circularity here.

Incomprehensible babble. Perhaps: "Release resources used to track
the reference count of a buffer which we no longer have pinned."

That's all I see on a first-read through. There might be other
issues, and I haven't checked through it in great detail for mundane
bugs, but generally, I favor pressing on relatively rapidly toward a
commit. It seems highly likely that this idea is a big win, and if
there's some situation in which it's a loss, we're more likely to find
out with it in the tree (and thus likely to be tested by many more
people) than by analysis from first principles.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 02:19:47
Message-ID: 16762.1409105987@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> The biggest concern previously were some benchmarks. I'm not entirely
> sure where to get a good testcase for this that's not completely
> artificial - most simpler testcases don't pin many buffers.

FWIW, I think that's by design; we don't ever want to pin more than one
buffer per relation+index in use in a given query. You could certainly
build complicated queries joining many tables in order to push up the
number of pinned buffers, but whether buffer pin manipulations would be
the bottleneck in such cases is pretty dubious.

I would say that the issue most deserving of performance testing is your
sizing of the linear-search array --- it's not obvious that 8 is a good
size.

Another thing to think about: a way to get to larger numbers of pinned
buffers without especially-complex queries is to have nested queries,
such as SQL queries inside plpgsql functions inside outer queries.
Does the patch logic take any advantage of the locality-of-reference
that will occur in such scenarios?

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 06:15:41
Message-ID: 20140827061541.GW21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-26 22:19:47 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > The biggest concern previously were some benchmarks. I'm not entirely
> > sure where to get a good testcase for this that's not completely
> > artificial - most simpler testcases don't pin many buffers.
>
> FWIW, I think that's by design; we don't ever want to pin more than one
> buffer per relation+index in use in a given query.

Right.

> You could certainly
> build complicated queries joining many tables in order to push up the
> number of pinned buffers, but whether buffer pin manipulations would be
> the bottleneck in such cases is pretty dubious.

Yea, I actually tried that and I didn't see anything.

> I would say that the issue most deserving of performance testing is your
> sizing of the linear-search array --- it's not obvious that 8 is a good
> size.

It's about the size of a cacheline on all common architectures, that's
how I found it. I don't think it makes a very big difference whether we
make it 4 or 12, but outside of that range I think it'll be unlikely to
be beneficial. The regression tests never go about three or four pins or
so currently, so I think that's a number unlikely to regularly be
crossed in practice.

> Another thing to think about: a way to get to larger numbers of pinned
> buffers without especially-complex queries is to have nested queries,
> such as SQL queries inside plpgsql functions inside outer queries.

What I did was hack together a pgbench script that does a lot of
DECLARE c_01 CURSOR FOR SELECT * FROM pg_attribute WHERE ctid = '(0, 1)';
FETCH NEXT FROM c_01;

I couldn't measure a bigger slowdown (as that has to be executed for
every xact) for the new code than for the old one.

> Does the patch logic take any advantage of the locality-of-reference
> that will occur in such scenarios?

Yes. Whenever a buffer is pinned/unpinned that's not in the array it'll
displace an entry from the array into the hashtable. Even though the
replacement is simplistic/linear I think that should nearly always end
up with the last used buffers in the array.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 06:34:20
Message-ID: 20140827063420.GX21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-26 22:04:03 -0400, Robert Haas wrote:
> On Tue, Aug 26, 2014 at 7:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Here's the next version of this patch.
>
> + * much never requried. So we keep a small array of reference counts
>
> Typo. But I think you could just drop the whole sentence about how
> things used to be, especially since it's recapitulated elsewhere.

Ok. I actually wonder about chucking out the whole explanation in
buf_init.c. There's been something there historically, but it's not
really a better place than just keeping everything in bufmgr.c.

> +#define REFCOUNT_ARRAY_ENTRIES 8 /* one full cacheline */
>
> Obviously that's not always going to be the case. You could say
> "about", or just drop the comment. Shouldn't "cache line" be two
> words?

Ok, will make it /* one cache line in common architectures */ - I want
the reasoning for the current size somewhere...

> + * refcounts are kept track of in the array, after that new array entries
>
> s/, after that/; after that,/
>
> + if (!found && !create)
> + else if (!found && free != NULL)
> + else if (!found)
> + else if (found && !do_move)
> + else if (found && free != NULL)
> + else if (found)
> + Assert(false); /* unreachable */
> + return res;
>
> There's not much point in testing found when you've already handled
> the not-found cases. But I'd reorganize this whole thing like this:
>
> if (!found) { if (!create) { return; } if (free != NULL) { stuff;
> return }; stuff; return; }
> if (!do_move) { return; } if (free != NULL) { stuff; return; } stuff; return;

The current if () ... isn't particularly nice, I agree.

> That's all I see on a first-read through. There might be other
> issues, and I haven't checked through it in great detail for mundane
> bugs, but generally, I favor pressing on relatively rapidly toward a
> commit. It seems highly likely that this idea is a big win, and if
> there's some situation in which it's a loss, we're more likely to find
> out with it in the tree (and thus likely to be tested by many more
> people) than by analysis from first principles.

I agree. As long as people are happy with the approach I think we can
iron out performance edge cases later.

I'll try to send a cleaned up version soon. I'm currently wondering
about adding some minimal regression test coverage for this. What I have
right now is stuff like
DECLARE c_01 CURSOR FOR SELECT * FROM pg_attribute WHERE ctid = '(0, 1)';
DECLARE c_02 CURSOR FOR SELECT * FROM pg_attribute WHERE ctid = '(1, 1)';
...
FETCH NEXT FROM c_01;
FETCH NEXT FROM c_02;
...
CLOSE c_01;
...

While that provides some coverage, I'm unconvinced that it's appropriate
for the regression tests?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 06:38:35
Message-ID: 20140827063835.GY21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-26 20:44:32 -0500, Jim Nasby wrote:
> These results look very encouraging, especially thinking about the
> cache impact.

Yep. I've seen PrivateRefCount array accesses prominently in the source
of cache misses in big servers.

> It occurs to me that it'd also be nice to have some
> stats available on how this is performing; perhaps a dtrace probe for
> whenever we overflow to the hash table, and one that shows maximum
> usage for a statement? (Presumably that's not much extra code or
> overhead...)

I don't use dtrace, so *I* won't do that. Personally I just dynamically
add probes using "perf probe" when I need to track something like this.

I don't see how you could track maximum usage without more
compliations/slowdowns than warranted.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 14:12:35
Message-ID: 20140827141235.GB7046@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
> On 2014-08-26 22:19:47 -0400, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:

> > I would say that the issue most deserving of performance testing is your
> > sizing of the linear-search array --- it's not obvious that 8 is a good
> > size.
>
> It's about the size of a cacheline on all common architectures, that's
> how I found it. I don't think it makes a very big difference whether we
> make it 4 or 12, but outside of that range I think it'll be unlikely to
> be beneficial. The regression tests never go about three or four pins or
> so currently, so I think that's a number unlikely to regularly be
> crossed in practice.

FWIW scanning a minmax index will keep three pages pinned IIRC
(metapage, current revmap page, current regular page).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-27 18:05:45
Message-ID: 53FE1DF9.3030305@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/27/14, 1:38 AM, Andres Freund wrote:
>> It occurs to me that it'd also be nice to have some
>> >stats available on how this is performing; perhaps a dtrace probe for
>> >whenever we overflow to the hash table, and one that shows maximum
>> >usage for a statement? (Presumably that's not much extra code or
>> >overhead...)
> I don't use dtrace, so*I* won't do that. Personally I just dynamically
> add probes using "perf probe" when I need to track something like this.

Yeah, I didn't mean dtrace directly; don't we have some macro that equates to dtrace or perf-probe depending on architecture?

> I don't see how you could track maximum usage without more
> compliations/slowdowns than warranted.

I was thinking we'd only show maximum if we overflowed, but maybe it's still too much overhead in that case.

In any case, I was thinking this would be trivial to add now, but if it's not then someone can do it when there's actual need.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date: 2014-08-29 22:08:34
Message-ID: 20140829220834.GJ10109@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-26 22:04:03 -0400, Robert Haas wrote:
> That's all I see on a first-read through.

I think I fixed all of them. Thanks.

> There might be other
> issues, and I haven't checked through it in great detail for mundane
> bugs, but generally, I favor pressing on relatively rapidly toward a
> commit. It seems highly likely that this idea is a big win, and if
> there's some situation in which it's a loss, we're more likely to find
> out with it in the tree (and thus likely to be tested by many more
> people) than by analysis from first principles.

Attached is the version I plan to commit after going over it again
tomorrow.

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Make-backend-local-tracking-of-buffer-pins-memory-ef.patch text/x-patch 21.8 KB