Re: Wait free LW_SHARED acquisition - v0.2

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Wait free LW_SHARED acquisition - v0.2
Date: 2014-02-01 08:51:49
Message-ID: CAM3SWZRgomgpKwrFE_FAB27qHsk=cahEqVbwyHTua6EVShu3Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I thought I'd try out what I was in an immediate position to do
without having access to dedicated multi-socket hardware: A benchmark
on AWS. This was a "c3.8xlarge" instance, which are reportedly backed
by Intel Xeon E5-2680 processors. Since the Intel ARK website reports
that these CPUs have 16 "threads" (8 cores + hyperthreading), and
AWS's marketing material indicates that this instance type has 32
"vCPUs", I inferred that the underlying hardware had 2 sockets.
However, reportedly that wasn't the case when procfs was consulted, no
doubt due to Xen Hypervisor voodoo:

ubuntu(at)ip-10-67-128-2:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 32
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Stepping: 4
CPU MHz: 2800.074
BogoMIPS: 5600.14
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-31

I ran the benchmark on Ubuntu 13.10 server, because that seemed to be
the only prominent "enterprise" x86_64 AMI (operating system image)
that came with GCC 4.8 as part its standard toolchain. This exact
setup is benchmarked here:

http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_c3&num=1

(Incidentally, some of the other benchmarks on that site use pgbench
to benchmark the Linux kernel, filesystems, disks and so on. e.g.:
http://www.phoronix.com/scan.php?page=news_item&px=NzI0NQ).

I was hesitant to benchmark using a virtualized system. There is a lot
of contradictory information about the overhead and/or noise added,
which may vary from one workload or hypervisor to the next. But, needs
must when the devil drives, and all that. Besides, this kind of setup
is very commercially relevant these days, so it doesn't seem
unreasonable to see how things work out on an AWS instance that
generally performs well for this workload. Of course, I still want to
replicate the big improvement you reported for multi-socket systems,
but you might have to wait a while for that, unless some kindly
benefactor that has a 4 socket server lying around would like to help
me out.

Results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/

You can drill down and find the postgresql.conf settings from the
report. There appears to be a modest improvement in transaction
throughput. It's not as large as the improvement you reported for your
2 socket workstation, but it's there, just about.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2014-02-01 09:36:25 Re: Recovery inconsistencies, standby much larger than primary
Previous Message Amit Kapila 2014-02-01 07:01:09 Re: [bug fix] pg_ctl always uses the same event source