Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

From: Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date: 2012-05-24 19:47:10
Message-ID: alpine.LRH.2.02.1205242008390.14366@calx046.ast.cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 24 May 2012, Robert Haas wrote:

> As you can see, raw performance isn't much worse with the larger data
> sets, but scalability at high connection counts is severely degraded
> once the working set no longer fits in shared_buffers.

Actually the problem persits even when I trim the dataset size to be within
the shared_buffers.

Here is the dump (0.5 gig in size, tested with shared_buffers=10G,
work_mem=500Mb):
http://www.ast.cam.ac.uk/~koposov/files/dump.gz
And I attach the script

For my toy dataset the performance of a single thread goes down
from ~6.4 to 18 seconds (~ 3 times worse),

And actually while running the script repeatedly on my main machine, for
some reason I saw some variation in terms of how much threaded execution
is slower than a single thread.

Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a
single thread.

The oprofile shows
782355 21.5269 s_lock
782355 100.000 s_lock [self]
-------------------------------------------------------------------------------
709801 19.5305 PinBuffer
709801 100.000 PinBuffer [self]
-------------------------------------------------------------------------------
326457 8.9826 LWLockAcquire
326457 100.000 LWLockAcquire [self]
-------------------------------------------------------------------------------
309437 8.5143 UnpinBuffer
309437 100.000 UnpinBuffer [self]
-------------------------------------------------------------------------------
252972 6.9606 ReadBuffer_common
252972 100.000 ReadBuffer_common [self]
-------------------------------------------------------------------------------
201558 5.5460 LockBuffer
201558 100.000 LockBuffer [self]
------------------------------------------------------------

It is interesting that On another machine with much smaller shared memory
(3G), smaller RAM (12G), smaller number of cpus and PG 9.1 running I was
getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread)

PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7-
4807 (the total number of real cores is 24)

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

Attachment Content-Type Size
script.sh application/x-sh 306 bytes
script.sql text/plain 258 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-24 19:54:54 Re: Backends stalled in 'startup' state: index corruption
Previous Message Merlin Moncure 2012-05-24 19:46:37 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile