Re: 60 core performance with 9.3

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 60 core performance with 9.3
Date: 2014-07-01 09:48:35
Message-ID: 53B283F3.7020005@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 27/06/14 21:19, Andres Freund wrote:
> On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote:
>> My feeling is spinlock or similar, 'perf top' shows
>>
>> kernel find_busiest_group
>> kernel _raw_spin_lock
>>
>> as the top time users.
>
> Those don't tell that much by themselves, could you do a hierarchical
> profile? I.e. perf record -ga? That'll at least give the callers for
> kernel level stuff. For more information compile postgres with
> -fno-omit-frame-pointer.
>

Unfortunately this did not help - had lots of unknown symbols from
postgres in the profile - I'm guessing the Ubuntu postgresql-9.3 package
needs either the -dev package or to be rebuilt with the enable profile
option (debug and no-omit-frame-pointer seem to be there already).

However further investigation did uncover *very* interesting things.
Firstly I had previously said that read only performance looked
ok...this was wrong, purely based on comparison to Robert's blog post.
Rebooting the 60 core box with 32 cores enabled showed that we got
*better* scaling performance in the read only case and illustrated we
were hitting a serious regression with more cores. At this point data is
needed:

Test: pgbench
Options: scale 500
read only
Os: Ubuntu 14.04
Pg: 9.3.4
Pg Options:
max_connections = 200
shared_buffers = 10GB
maintenance_work_mem = 1GB
effective_io_concurrency = 10
wal_buffers = 32MB
checkpoint_segments = 192
checkpoint_completion_target = 0.8

Results

Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
--------+------------------+-----------------
6 | 70400 | 71028
12 | 98918 | 129140
24 | 230345 | 240631
48 | 324042 | 409510
96 | 346929 | 120464
192 | 312621 | 92663

So we have anti scaling with 60 cores as we increase the client
connections. Ouch! A level of urgency led to trying out Andres's
'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4
branch and building a package from that and retesting:

Clients | 9.4 tps 60 cores (rwlock)
--------+--------------------------
6 | 70189
12 | 128894
24 | 233542
48 | 422754
96 | 590796
192 | 630672

Wow - that is more like it! Andres that is some nice work, we definitely
owe you some beers for that :-) I am aware that I need to retest with an
unpatched 9.4 src - as it is not clear from this data how much is due to
Andres's patches and how much to the steady stream of 9.4 development.
I'll post an update on that later, but figured this was interesting
enough to note for now.

Regards

Mark

[1] from git://git.postgresql.org/git/users/andresfreund/postgres.git,
commits:
4b82477dcaf81ad7b0c102f4b66e479a5eb9504a
10d72b97f108b6002210ea97a414076a62302d4e
67ffebe50111743975d54782a3a94b15ac4e755f
fe686ed18fe132021ee5e557c67cc4d7c50a1ada
f2378dc2fa5b73c688f696704976980bab90c611

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2014-07-01 10:04:29 Re: 60 core performance with 9.3
Previous Message Pujol Mathieu 2014-07-01 07:46:50 Re: GIST optimization to limit calls to operator on sub nodes