Quick Links

Re: Buffer Allocation Concurrency Limits

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Jason Petersen <jason(at)citusdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Buffer Allocation Concurrency Limits
Date:	2014-04-09 03:01:01
Message-ID:	CAA4eK1Kbc4erA9pw8DjbsEWb9VXwCD1_=BkE5f1Bt5uY42mBDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 8, 2014 at 10:38 PM, Jason Petersen <jason(at)citusdata(dot)com> wrote:
> In December, Metin (a coworker of mine) discussed an inability to scale a
> simple task (parallel scans of many independent tables) to many cores (it's
> here). As a ramp-up task at Citus I was tasked to figure out what the heck
> was going on here.
>
> I have a pretty extensive writeup here (whose length is more a result of my
> inexperience with the workings of PostgreSQL than anything else) and was
> looking for some feedback.

At this moment, I am not able to open the above link (here), may be some
problem (it's showing Service Unavailable); I will try it later.

> In short, my conclusion is that a working set larger than memory results in
> backends piling up on BufFreelistLock.

Here when you say that working set larger than memory, do you mean to refer
*memory* as shared_buffers?
I think if the data is more than total memory available, anyway the
effect of I/O
can over shadow the effect of BufFreelistLock contention.

> As much as possible I removed
> anything that could be blamed for this:
>
> Hyper-Threading is disabled
> zone reclaim mode is disabled
> numactl was used to ensure interleaved allocation
> kernel.sched_migration_cost was set to highly disable migration
> kernel.sched_autogroup_enabled was disabled
> transparent hugepage support was disabled
>
>
> For a way forward, I was thinking the buffer allocation sections could use
> some of the atomics Andres added here. Rather than workers grabbing
> BufFreelistLock to iterate the clock hand until they find a victim, the
> algorithm could be rewritten in a lock-free style, allowing workers to move
> the clock hand in tandem.
>
> Alternatively, the clock iteration could be moved off to a background
> process, similar to what Amit Kapila proposed here.

I think both of the above ideas can be useful, but not sure if they are
sufficient for scaling shared buffer's.

> Is this assessment accurate? I know 9.4 changes a lot about lock
> organization, but last I looked I didn't see anything that could alleviate
> this contention: are there any plans to address this?

I am planing to work on this for 9.5.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Buffer Allocation Concurrency Limits at 2014-04-08 17:08:46 from Jason Petersen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Etsuro Fujita	2014-04-09 03:03:03	Re: Minor improvements in alter_table.sgml
Previous Message	Tom Lane	2014-04-09 02:48:32	Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)