Quick Links

User concurrency thresholding: where do I look?

Lists:	pgsql-performance

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	User concurrency thresholding: where do I look?
Date:	2007-07-19 15:28:59
Message-ID:	200707190828.59088.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Folks,

I've run into this a number of times with various PostgreSQL users, so we
tested it at Sun. What seems to be happening is that at some specific number
of connections average throughput drops 30% and response time quadruples or
worse. The amount seems to vary per machine; I've seen it as variously 95,
1050, 1700 or 2800 connections. Tinkering with postgresql.conf parameters
doesn't seem to affect this threshold.

As an example of this behavior:

Users Txn/User Resp. Time
50 105.38 0.01
100 113.05 0.01
150 114.05 0.01
200 113.51 0.01
250 113.38 0.01
300 112.14 0.01
350 112.26 0.01
400 111.43 0.01
450 110.72 0.01
500 110.44 0.01
550 109.36 0.01
600 107.01 0.02
650 105.71 0.02
700 106.95 0.02
750 107.69 0.02
800 106.78 0.02
850 108.59 0.02
900 106.03 0.02
950 106.13 0.02
1000 64.58 0.15
1050 52.32 0.23
1100 49.79 0.25

Tinkering with shared_buffers has had no effect on this threholding (the above
was with 3gb to 6gb of shared_buffers). Any ideas on where we should look
for the source of the bottleneck?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 15:44:18
Message-ID:	469F86D2.4080303@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus wrote:
> Folks,
>

> 650 105.71 0.02
> 700 106.95 0.02
> 750 107.69 0.02
> 800 106.78 0.02
> 850 108.59 0.02
> 900 106.03 0.02
> 950 106.13 0.02
> 1000 64.58 0.15
> 1050 52.32 0.23
> 1100 49.79 0.25
>
> Tinkering with shared_buffers has had no effect on this threholding (the above
> was with 3gb to 6gb of shared_buffers). Any ideas on where we should look
> for the source of the bottleneck?

I have seen this as well. I always knocked it up to PG having to
managing so many connections but there are some interesting evidences to
review.

The amount of memory "each" connection takes up. Consider 4-11 meg per
connection depending on various things like number of prepared queries.

Number of CPUs. Obviously 500 connections over 4 CPUS isn't the same as
500 connections over 8 CPUS.

That number of connections generally means a higher velocity, a higher
velocity means more checkpoint segments. Wrong settings with your
checkpoint segments, bgwriter and checkpoint will cause you to start
falling down.

I would also note that our experience is that PG falls down a little
higher, more toward 2500 connections last time I checked, but this was
likely on different hardware.

Joshua D. Drake

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 15:49:12
Message-ID:	20070719154912.GG6538@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus wrote:
> Folks,
>
> I've run into this a number of times with various PostgreSQL users, so we
> tested it at Sun. What seems to be happening is that at some specific number
> of connections average throughput drops 30% and response time quadruples or
> worse. The amount seems to vary per machine; I've seen it as variously 95,
> 1050, 1700 or 2800 connections. Tinkering with postgresql.conf parameters
> doesn't seem to affect this threshold.
>
> As an example of this behavior:
>
> Users Txn/User Resp. Time
> 50 105.38 0.01
> 100 113.05 0.01
> 150 114.05 0.01
> 200 113.51 0.01
> 250 113.38 0.01
> 300 112.14 0.01
> 350 112.26 0.01
> 400 111.43 0.01
> 450 110.72 0.01
> 500 110.44 0.01
> 550 109.36 0.01
> 600 107.01 0.02
> 650 105.71 0.02
> 700 106.95 0.02
> 750 107.69 0.02
> 800 106.78 0.02
> 850 108.59 0.02
> 900 106.03 0.02
> 950 106.13 0.02
> 1000 64.58 0.15
> 1050 52.32 0.23
> 1100 49.79 0.25
>
> Tinkering with shared_buffers has had no effect on this threholding (the above
> was with 3gb to 6gb of shared_buffers). Any ideas on where we should look
> for the source of the bottleneck?

Have you messed with max_connections and/or max_locks_per_transaction
while testing this? The lock table is sized to max_locks_per_xact times
max_connections, and shared memory hash tables get slower when they are
full. Of course, the saturation point would depend on the avg number of
locks acquired per user, which would explain why you are seeing a lower
number for some users and higher for others (simpler/more complex
queries).

This is just a guess though. No profiling or measuring at all, really.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep." (Robert Davidson)
http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 16:22:24
Message-ID:	200707190922.24497.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Alvaro,

> Have you messed with max_connections and/or max_locks_per_transaction
> while testing this? The lock table is sized to max_locks_per_xact times
> max_connections, and shared memory hash tables get slower when they are
> full. Of course, the saturation point would depend on the avg number of
> locks acquired per user, which would explain why you are seeing a lower
> number for some users and higher for others (simpler/more complex
> queries).

That's an interesting thought. Let me check lock counts and see if this is
possibly the case.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 17:25:38
Message-ID:	12686.1184865938@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Alvaro,
>> Have you messed with max_connections and/or max_locks_per_transaction
>> while testing this? The lock table is sized to max_locks_per_xact times
>> max_connections, and shared memory hash tables get slower when they are
>> full. Of course, the saturation point would depend on the avg number of
>> locks acquired per user, which would explain why you are seeing a lower
>> number for some users and higher for others (simpler/more complex
>> queries).

> That's an interesting thought. Let me check lock counts and see if this is
> possibly the case.

AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
space entirely; and the fact that you can continue the curve upwards
says that you're not on the edge of running out. However I agree that
it's worth experimenting with those two parameters to see if the curve
moves around at all.

Another resource that might be interesting is the number of open files.

Also, have you tried watching vmstat or local equivalent to confirm that
the machine's not starting to swap?

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 17:37:04
Message-ID:	20070719173704.GA12084@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > Alvaro,
> >> Have you messed with max_connections and/or max_locks_per_transaction
> >> while testing this? The lock table is sized to max_locks_per_xact times
> >> max_connections, and shared memory hash tables get slower when they are
> >> full. Of course, the saturation point would depend on the avg number of
> >> locks acquired per user, which would explain why you are seeing a lower
> >> number for some users and higher for others (simpler/more complex
> >> queries).
>
> > That's an interesting thought. Let me check lock counts and see if this is
> > possibly the case.
>
> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
> space entirely;

Well, if there still is shared memory available, the lock hash can
continue to grow, but it would slow down according to this comment in
ShmemInitHash:

* max_size is the estimated maximum number of hashtable entries. This is
* not a hard limit, but the access efficiency will degrade if it is
* exceeded substantially (since it's used to compute directory size and
* the hash table buckets will get overfull).

For the lock hash tables this max_size is
(MaxBackends+max_prepared_xacts) * max_locks_per_xact.

So maybe this does not make much sense in normal operation, thus not
applicable to what Josh Berkus is reporting.

However I was talking to Josh Drake yesterday and he told me that
pg_dump was spending some significant amount of time in LOCK TABLE when
there are lots of tables (say 300k).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 17:45:15
Message-ID:	14009.1184867115@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
>> space entirely;

> Well, if there still is shared memory available, the lock hash can
> continue to grow, but it would slow down according to this comment in
> ShmemInitHash:

Right, but there's not an enormous amount of headroom in shared memory
beyond the intended size of the hash tables. I'd think that you'd start
seeing hard failures not very far beyond the point at which performance
impacts became visible. Of course this is all speculation; I quite
agree with varying the table-size parameters to see if it makes a
difference.

Josh, what sort of workload is being tested here --- read-mostly,
write-mostly, a mixture?

> However I was talking to Josh Drake yesterday and he told me that
> pg_dump was spending some significant amount of time in LOCK TABLE when
> there are lots of tables (say 300k).

I wouldn't be too surprised if there's some O(N^2) effects when a single
transaction holds that many locks, because of the linked-list proclock
data structures. This would not be relevant to Josh's case though.

regards, tom lane

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 18:27:45
Message-ID:	469FAD21.4020800@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Alvaro Herrera wrote:
> Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:

> So maybe this does not make much sense in normal operation, thus not
> applicable to what Josh Berkus is reporting.
>
> However I was talking to Josh Drake yesterday and he told me that
> pg_dump was spending some significant amount of time in LOCK TABLE when
> there are lots of tables (say 300k).

Less, 128k

Joshua D. Drake

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-19 19:04:44
Message-ID:	Pine.GSO.4.64.0707191449260.28215@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 19 Jul 2007, Josh Berkus wrote:

> What seems to be happening is that at some specific number of
> connections average throughput drops 30% and response time quadruples or
> worse.

Could you characterize what each connection is doing and how you're
generating the load? I don't know how productive speculating about the
cause here will be until there's a test script available so other people
can see where the tipping point is on their system.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 01:19:13
Message-ID:	200707191819.13899.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom, all:

> Also, have you tried watching vmstat or local equivalent to confirm that
> the machine's not starting to swap?

We're not swapping.

> Josh, what sort of workload is being tested here --- read-mostly,
> write-mostly, a mixture?

It's a TPCC-like workload, so heavy single-row updates, and the
updates/inserts are what's being measured. For that matter, when I've seen
this before it was with heavy-write workloads and we were measuring the
number of updates/inserts and not the number of reads.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 01:45:26
Message-ID:	15685.1184895926@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Josh, what sort of workload is being tested here --- read-mostly,
>> write-mostly, a mixture?

> It's a TPCC-like workload, so heavy single-row updates, and the
> updates/inserts are what's being measured. For that matter, when I've seen
> this before it was with heavy-write workloads and we were measuring the
> number of updates/inserts and not the number of reads.

Well, if the load is a lot of short writing transactions then you'd
expect the throughput to depend on how fast stuff can be pushed down to
WAL. What have you got wal_buffers set to? Are you using a commit
delay? What's the I/O system anyway (any BB write cache on the WAL
disk?) and what wal sync method are you using?

While I'm asking questions, exactly what were the data columns you
presented? Txn/User doesn't make much sense to me, and I'm not sure
what "response time" you were measuring either.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 04:38:39
Message-ID:	200707192138.39679.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom,

> Well, if the load is a lot of short writing transactions then you'd
> expect the throughput to depend on how fast stuff can be pushed down to
> WAL. What have you got wal_buffers set to? Are you using a commit
> delay? What's the I/O system anyway (any BB write cache on the WAL
> disk?) and what wal sync method are you using?

You know, I think Jignesh needs to me on this list so I can stop relaying
questions on a workload I didn't design. Let me get him.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 11:46:07
Message-ID:	871wf39u2o.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
>> That's an interesting thought. Let me check lock counts and see if this is
>> possibly the case.
>
> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
> space entirely

I assume you've checked the server logs and are sure that you aren't in fact
getting errors. I could, for example, envision a situation where a fraction of
the transactions are getting some error and those transactions are therefore
not being counted against the txn/s result.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 17:05:54
Message-ID:	Pine.GSO.4.64.0707201237070.2776@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 19 Jul 2007, Josh Berkus wrote:

> It's a TPCC-like workload, so heavy single-row updates, and the
> updates/inserts are what's being measured.

There's so much going on with a TPC-C kind of workload. Has anyone ever
looked into quantifying scaling for more fundamental operations? There
are so many places a complicated workload could get caught up that
starting there is hard. I've found it's helpful to see the breaking
points for simpler operations, then compare how things change as each new
transaction element is introduced.

As an example, take a look at the MySQL SysBench tool:
http://sysbench.sourceforge.net/docs/

Specifically their "oltp" tests. Note how you can get a handle on how
simple selects scale, then simple inserts, then updates, and so on. The
only thing I've thought of they missed is testing a trivial operation that
doesn't even touch the buffer cache ('SELECT 1'?) that would let you
quantify just general connection scaling issues.

It seems to me that you could narrow the list of possible causes here much
more quickly if you had a good handle on the upper concurrency of
lower-level operations.

[Note: it's possible to run SysBench against a PG database, but the code
is very immature. Last time I tried there were plenty of crashes and
there seemed to be some transaction wrapping issues that caused deadlocks
with some tests.]

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 17:17:35
Message-ID:	46A0EE2F.4080100@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg,

> There's so much going on with a TPC-C kind of workload. Has anyone ever
> looked into quantifying scaling for more fundamental operations? There
> are so many places a complicated workload could get caught up that
> starting there is hard. I've found it's helpful to see the breaking
> points for simpler operations, then compare how things change as each
> new transaction element is introduced.

... eagerly awaiting Michael Doilson's PgUnitTest ....

--Josh

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 18:08:49
Message-ID:	46A0FA31.2080208@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Awww Josh,

I was just enjoying the chat on the picket fence! :-)

Anyway the workload is mixed (reads,writes) with simple to medium
queries. The workload is known to scale well. But inorder to provide
substantial input I am still trying to eliminate things that can
bottleneck. Currently I have eliminated CPU (plenty free) , RAM
(memory is 48GB RAM in this server for a 32-bit postgresql instance),
IO Storage (used the free ram to do /tmp database to eliminate IO) and
am still trying to eliminate any network bottlenecks to say for sure we
have a problem in PostgreSQL. But yes till that final thing is confirmed
(network which can very well be the case) it could be a problem
somewhere else. However the thing that worries me is more of the big
drop instead of remaining constant out there..

Anyway more on this within a day or two once I add more network nics
between the systems to eliminate network problems (even though stats
dont show them as problems right now) and also reduce malloc lock
penalties if any.

As for other questions:

max_locks_per_transactions is set to default (10 I believe) increasing
it still seems to degrade overall throughput number.

max_connections is set to 1500 for now till I get decent scaling till
1400-1500 users.

There are no hard failures reported anywhere. Log min durations does
show that queries are now slowing down and taking longer.

OS is not swapping and also eliminated IO by putting the whole database
on /tmp

So while I finish adding more network connections between the two
systems (need to get cards) do enjoy the following URL :-)

http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html

Of course all postgresql.conf still remains from the old test so no
flames on that one again :-)

Regards,
Jignesh

Josh Berkus wrote:
> Tom,
>
>
>> Well, if the load is a lot of short writing transactions then you'd
>> expect the throughput to depend on how fast stuff can be pushed down to
>> WAL. What have you got wal_buffers set to? Are you using a commit
>> delay? What's the I/O system anyway (any BB write cache on the WAL
>> disk?) and what wal sync method are you using?
>>
>
> You know, I think Jignesh needs to me on this list so I can stop relaying
> questions on a workload I didn't design. Let me get him.
>
>

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 18:18:05
Message-ID:	46A0FC5D.40001@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I forgot to add one more piece of information.. I also tried the same
test with 64-bit postgresql with 6GB shared_buffers and results are the
same it drops around the same point which to me sounds like a bottleneck..

More later

-Jignesh

Jignesh K. Shah wrote:
> Awww Josh,
>
> I was just enjoying the chat on the picket fence! :-)
>
> Anyway the workload is mixed (reads,writes) with simple to medium
> queries. The workload is known to scale well. But inorder to provide
> substantial input I am still trying to eliminate things that can
> bottleneck. Currently I have eliminated CPU (plenty free) , RAM
> (memory is 48GB RAM in this server for a 32-bit postgresql
> instance), IO Storage (used the free ram to do /tmp database to
> eliminate IO) and am still trying to eliminate any network
> bottlenecks to say for sure we have a problem in PostgreSQL. But yes
> till that final thing is confirmed (network which can very well be the
> case) it could be a problem somewhere else. However the thing that
> worries me is more of the big drop instead of remaining constant out
> there..
>
> Anyway more on this within a day or two once I add more network nics
> between the systems to eliminate network problems (even though stats
> dont show them as problems right now) and also reduce malloc lock
> penalties if any.
>
> As for other questions:
>
> max_locks_per_transactions is set to default (10 I believe) increasing
> it still seems to degrade overall throughput number.
>
> max_connections is set to 1500 for now till I get decent scaling till
> 1400-1500 users.
>
> There are no hard failures reported anywhere. Log min durations does
> show that queries are now slowing down and taking longer.
>
> OS is not swapping and also eliminated IO by putting the whole
> database on /tmp
>
> So while I finish adding more network connections between the two
> systems (need to get cards) do enjoy the following URL :-)
>
> http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html
>
>
> Of course all postgresql.conf still remains from the old test so no
> flames on that one again :-)
>
> Regards,
> Jignesh
>
>
>
>
> Josh Berkus wrote:
>> Tom,
>>
>>
>>> Well, if the load is a lot of short writing transactions then you'd
>>> expect the throughput to depend on how fast stuff can be pushed down to
>>> WAL. What have you got wal_buffers set to? Are you using a commit
>>> delay? What's the I/O system anyway (any BB write cache on the WAL
>>> disk?) and what wal sync method are you using?
>>>
>>
>> You know, I think Jignesh needs to me on this list so I can stop
>> relaying questions on a workload I didn't design. Let me get him.
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 18:23:51
Message-ID:	12002.1184955831@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> There are no hard failures reported anywhere. Log min durations does
> show that queries are now slowing down and taking longer.
> OS is not swapping and also eliminated IO by putting the whole database
> on /tmp

Hmm. Do you see any evidence of a context swap storm (ie, a drastic
increase in the context swaps/second reading reported by vmstat)?

regards, tom lane

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 18:56:31
Message-ID:	46A1055F.9010304@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Yes I did see increase in context switches and CPU migrations at that
point using mpstat.

Regards,
Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> There are no hard failures reported anywhere. Log min durations does
>> show that queries are now slowing down and taking longer.
>> OS is not swapping and also eliminated IO by putting the whole database
>> on /tmp
>>
>
> Hmm. Do you see any evidence of a context swap storm (ie, a drastic
> increase in the context swaps/second reading reported by vmstat)?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 19:13:13
Message-ID:	12445.1184958793@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> Yes I did see increase in context switches and CPU migrations at that
> point using mpstat.

So follow that up --- try to determine which lock is being contended
for. There's some very crude code in the sources that you can enable
with -DLWLOCK_STATS, but probably DTrace would be a better tool.

regards, tom lane

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 19:46:30
Message-ID:	46A11116.5030905@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>> Yes I did see increase in context switches and CPU migrations at that
>> point using mpstat.
>
> So follow that up --- try to determine which lock is being contended
> for. There's some very crude code in the sources that you can enable
> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
>
> regards, tom lane

Using plockstat -A -s 5 -p $pid

on bgwriter: doesnt report anything

On one of the many connections:

This one is hard to read easily
By default, plockstat monitors all lock con-
tention events, gathers frequency and timing data about
those events, and displays the data in decreasing frequency
order, so that the most common events appear first.

^Cbash-3.00# plockstat -A -s 5 -p 6401
^C
Mutex hold

-------------------------------------------------------------------------------
Count nsec Lock Caller
59 186888 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
16384 | | 1 libumem.so.1`process_free+0x12c
32768 |@@@@@ | 14 postgres`AllocSetDelete+0x98
65536 |@@ | 5
postgres`MemoryContextDelete+0x78
131072 | | 0 postgres`CommitTransaction+0x240
262144 |@@@@@@@@@@@@@@@ | 39
-------------------------------------------------------------------------------
Count nsec Lock Caller
530 12226 0x10059e280
libumem.so.1`umem_cache_alloc+0x200

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@ | 338
libumem.so.1`umem_cache_alloc+0x200
8192 |@ | 24 libumem.so.1`umem_alloc+0x5c
16384 |@ | 37 libumem.so.1`malloc+0x40
32768 |@@@@@ | 131 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count nsec Lock Caller
324 10214 0x100578030 libumem.so.1`vmem_xfree+0x164

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@ | 192 libumem.so.1`vmem_xfree+0x164
8192 |@@@@ | 56 libumem.so.1`process_free+0x12c
16384 |@ | 26 postgres`AllocSetDelete+0x98
32768 |@@@ | 50
postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count nsec Lock Caller
161 13585 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@ | 118 libumem.so.1`process_free+0x12c
8192 | | 4 postgres`AllocSetDelete+0x98
16384 |@ | 10
postgres`MemoryContextDelete+0x78
32768 |@@@ | 24 postgres`PortalDrop+0x160
65536 | | 3
131072 | | 0
262144 | | 2
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 6081 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_xalloc+0x630

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@ | 170 libumem.so.1`vmem_xalloc+0x630
8192 |@@@@@@@@@@@ | 155 libumem.so.1`vmem_alloc+0x1f8
16384 | | 1 libumem.so.1`vmem_xalloc+0x524
libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 5867 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x248

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@ | 185 libumem.so.1`vmem_alloc+0x248
8192 |@@@@@@@@@@ | 141
libumem.so.1`vmem_sbrk_alloc+0x30
libumem.so.1`vmem_xalloc+0x524
libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count nsec Lock Caller
318 5873 0x100578030 libumem.so.1`vmem_alloc+0x1d0

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@ | 228 libumem.so.1`vmem_alloc+0x1d0
8192 |@@@@@ | 78 libumem.so.1`umem_alloc+0xec
16384 | | 6 libumem.so.1`malloc+0x40
32768 | | 6 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 5591 0x100578030 libumem.so.1`vmem_xalloc+0x630

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@ | 213 libumem.so.1`vmem_xalloc+0x630
8192 |@@@@@@@@ | 112 libumem.so.1`vmem_alloc+0x1f8
16384 | | 0 libumem.so.1`umem_alloc+0xec
32768 | | 1 libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count nsec Lock Caller
324 5208 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_xfree+0x164

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@ | 236 libumem.so.1`vmem_xfree+0x164
8192 |@@@@@@ | 88 libumem.so.1`process_free+0x12c
postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 4108 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_alloc+0x1d0

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@ | 325 libumem.so.1`vmem_alloc+0x1d0
8192 | | 1 libumem.so.1`vmem_xalloc+0x524
libumem.so.1`vmem_alloc+0x1f8
libumem.so.1`umem_alloc+0xec
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 4108 0x100578030 libumem.so.1`vmem_xalloc+0x50c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@ | 325 libumem.so.1`vmem_xalloc+0x50c
8192 | | 1 libumem.so.1`vmem_alloc+0x1f8
libumem.so.1`umem_alloc+0xec
libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count nsec Lock Caller
326 4096 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_xalloc+0x50c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 326 libumem.so.1`vmem_xalloc+0x50c
libumem.so.1`vmem_alloc+0x1f8
libumem.so.1`vmem_xalloc+0x524
libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count nsec Lock Caller
240 5444 libumem.so.1`vmem0+0x30 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@ | 167 libumem.so.1`process_free+0x12c
8192 |@@@@@@@ | 72 postgres`AllocSetDelete+0x98
16384 | | 0
postgres`MemoryContextDelete+0x78
32768 | | 1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count nsec Lock Caller
123 9057 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@ | 60
libumem.so.1`umem_depot_alloc+0xb8
8192 |@@@@ | 24
libumem.so.1`umem_cache_free+0xc4
16384 |@@@@@@@ | 37 libumem.so.1`process_free+0x12c
32768 | | 2 postgres`AllocSetDelete+0x98
-------------------------------------------------------------------------------
Count nsec Lock Caller
200 4935 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@ | 185 libumem.so.1`process_free+0x12c
8192 | | 4 postgres`AllocSetDelete+0x98
16384 |@ | 10
postgres`MemoryContextDelete+0x78
32768 | | 1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count nsec Lock Caller
164 5219 0x100595700
libumem.so.1`umem_cache_alloc+0x200

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@ | 121
libumem.so.1`umem_cache_alloc+0x200
8192 |@@@@@@ | 42 libumem.so.1`umem_alloc+0x5c
16384 | | 1 libumem.so.1`malloc+0x40
postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count nsec Lock Caller
122 6748 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@ | 43
libumem.so.1`umem_depot_alloc+0xb8
8192 |@@@@@@@@@@@@@@@ | 79
libumem.so.1`umem_cache_alloc+0xa0
libumem.so.1`umem_alloc+0x5c
libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count nsec Lock Caller
163 4146 0x100595700 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@ | 161 libumem.so.1`process_free+0x12c
8192 | | 2 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count nsec Lock Caller
50 12615 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@ | 28 libumem.so.1`process_free+0x12c
8192 |@ | 3 postgres`AllocSetDelete+0x98
16384 |@@@ | 8
postgres`MemoryContextDelete+0x78
32768 |@@@@@ | 11 postgres`FreeExecutorState+0x6c
-------------------------------------------------------------------------------
Count nsec Lock Caller
123 4096 0x10059e1d0
libumem.so.1`umem_cache_free+0xec

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 123
libumem.so.1`umem_cache_free+0xec
libumem.so.1`process_free+0x12c
postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count nsec Lock Caller
122 4096 0x10059e1d0
libumem.so.1`umem_cache_alloc+0xc4

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 122
libumem.so.1`umem_cache_alloc+0xc4
libumem.so.1`umem_alloc+0x5c
libumem.so.1`malloc+0x40
postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count nsec Lock Caller
37 7970 libumem.so.1`vmem0+0x30 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@ | 2 libumem.so.1`process_free+0x12c
8192 |@@@@@@@@@@@@@@@@@@@@@@ | 35 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78

postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count nsec Lock Caller
37 5867 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@ | 33 libumem.so.1`process_free+0x12c
8192 |@ | 2 postgres`AllocSetDelete+0x98
16384 | | 0
postgres`MemoryContextDelete+0x78
32768 |@ | 2
postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count nsec Lock Caller
39 4516 libumem.so.1`vmem0+0x30 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@ | 35 libumem.so.1`process_free+0x12c
8192 |@@ | 4 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count nsec Lock Caller
37 4428 0x10058b700
libumem.so.1`umem_cache_alloc+0x200

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@ | 34
libumem.so.1`umem_cache_alloc+0x200
8192 |@ | 3 libumem.so.1`umem_alloc+0x5c
libumem.so.1`malloc+0x40

postgres`base_yy_scan_buffer+0x38
-------------------------------------------------------------------------------
Count nsec Lock Caller
37 4206 0x10058b700 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@ | 36 libumem.so.1`process_free+0x12c
8192 | | 1 postgres`scanner_finish+0x50
postgres`raw_parser+0x3c
postgres`pg_parse_query+0x54
-------------------------------------------------------------------------------
Count nsec Lock Caller
11 10426 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@ | 6 libumem.so.1`process_free+0x12c
8192 |@@@@@@ | 3 postgres`AllocSetDelete+0x98
16384 | | 0
postgres`MemoryContextDelete+0x78
32768 |@@@@ | 2 postgres`ExecEndAgg+0x68
-------------------------------------------------------------------------------
Count nsec Lock Caller
8 5120 libumem.so.1`vmem0+0x30 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@ | 6 libumem.so.1`process_free+0x12c
8192 |@@@@@@ | 2 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count nsec Lock Caller
8 4096 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 8 libumem.so.1`process_free+0x12c
postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count nsec Lock Caller
8 4096 0x100578030 libumem.so.1`vmem_alloc+0x1d0

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 8 libumem.so.1`vmem_alloc+0x1d0
libumem.so.1`umem_alloc+0xec
libumem.so.1`malloc+0x40
postgres`AllocSetAlloc+0x314
-------------------------------------------------------------------------------
Count nsec Lock Caller
3 4096 0x10059e280 libumem.so.1`process_free+0x12c

nsec ---- Time Distribution --- count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@| 3 libumem.so.1`process_free+0x12c
postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
postgres`tbm_free+0x10
bash-3.00#

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 20:16:01
Message-ID:	14909.1184962561@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> Tom Lane wrote:
>> So follow that up --- try to determine which lock is being contended
>> for. There's some very crude code in the sources that you can enable
>> with -DLWLOCK_STATS, but probably DTrace would be a better tool.

> Using plockstat -A -s 5 -p $pid

I don't know what that is, but it doesn't appear to have anything to do
with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
Try asking Robert Lor about this --- IIRC he had some dtrace probes to
work with our locks.

regards, tom lane

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 20:51:39
Message-ID:	46A1205B.8010205@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

sorry..

The are solaris mutex locks used by the postgresql process.

What its saying is that there are holds/waits in trying to get locks
which are locked at Solaris user library levels called from the
postgresql functions:
For example both the following functions are hitting on the same mutex
lock 0x10059e280 in Solaris Library call:
postgres`AllocSetDelete+0x98
postgres`AllocSetAlloc+0x1c4

I need to enable the DTrace probes on my builds

-Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> Tom Lane wrote:
>>
>>> So follow that up --- try to determine which lock is being contended
>>> for. There's some very crude code in the sources that you can enable
>>> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
>>>
>
>
>> Using plockstat -A -s 5 -p $pid
>>
>
> I don't know what that is, but it doesn't appear to have anything to do
> with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
> Try asking Robert Lor about this --- IIRC he had some dtrace probes to
> work with our locks.
>
> regards, tom lane
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 20:57:34
Message-ID:	15343.1184965054@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> What its saying is that there are holds/waits in trying to get locks
> which are locked at Solaris user library levels called from the
> postgresql functions:
> For example both the following functions are hitting on the same mutex
> lock 0x10059e280 in Solaris Library call:
> postgres`AllocSetDelete+0x98
> postgres`AllocSetAlloc+0x1c4

That's a perfect example of the sort of useless overhead that I was
complaining of just now in pgsql-patches. Having malloc/free use
an internal mutex is necessary in multi-threaded programs, but the
backend isn't multi-threaded. And yet, apparently you can't turn
that off in Solaris.

(Fortunately, the palloc layer is probably insulating us from malloc's
performance enough that this isn't a huge deal. But it's annoying.)

regards, tom lane

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-20 21:24:33
Message-ID:	46A12811.4020205@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

True you cant switch off the locks since libthread has been folded into
libc in Solaris 10.

Anyway just to give you an idea of the increase in context switching at
the break point here are the mpstat (taken at 10 second interval) on
this 8-socket Sun Fire V890.

The low icsw (Involuntary Context Switches) is about 950-1000 user mark
after which a context switch storm starts at users above 1000-1050 mark
and drops in total throughput drops about 30% instantaneously.. I will
try rebuilding the postgresql with dtrace probes to get more clues.
(NOTE you will see 1 cpu (cpuid:22) doing more system work... thats the
one doing handling the network interrupts)

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 57 0 27 108 6 4072 98 1749 416 1 7763 47 13 0 40
1 46 0 24 22 6 4198 11 1826 427 0 7547 45 13 0 42
2 42 0 34 104 8 4103 91 1682 424 1 7797 46 13 0 41
3 51 0 22 21 6 4125 10 1734 435 0 7399 45 13 0 43
4 65 0 27 19 6 4015 8 1706 411 0 7292 44 15 0 41
5 54 0 21 21 6 4297 10 1702 464 0 7708 45 13 0 42
6 36 0 16 66 47 4218 12 1713 426 0 7685 47 11 0 42
7 40 0 100 318 206 3699 10 1534 585 0 6851 45 14 0 41
16 41 0 30 87 5 3780 78 1509 401 1 7604 45 13 0 42
17 39 0 24 22 5 3970 12 1631 408 0 7265 44 12 0 44
18 42 0 24 99 5 3829 89 1519 401 1 7343 45 12 0 43
19 39 0 31 78830 5 3588 8 1509 400 0 6629 43 13 0 44
20 22 0 20 19 6 3925 9 1577 419 0 7364 44 12 0 44
21 38 0 31 23 5 3792 13 1566 407 0 7133 45 12 0 44
22 8 0 110 7053 7045 1641 8 728 838 0 2917 16 50 0 33
23 62 0 29 21 5 3985 10 1579 449 0 7368 44 12 0 44
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 13 0 27 123 6 4228 113 1820 433 1 8084 49 13 0 38
1 16 0 63 26 6 4253 15 1875 420 0 7754 47 14 0 39
2 11 0 31 110 8 4178 97 1741 425 1 8095 48 14 0 38
3 8 0 24 20 6 4257 9 1818 444 0 7807 47 13 0 40
4 13 0 54 28 6 4145 17 1774 426 1 7732 46 16 0 38
5 12 0 35 23 6 4412 12 1775 447 0 8249 48 13 0 39
6 8 0 24 38 15 4323 14 1760 422 0 8016 49 11 0 39
7 8 0 120 323 206 3801 15 1599 635 0 7290 47 15 0 38
16 11 0 44 107 5 3896 98 1582 393 1 7997 47 15 0 39
17 15 0 29 24 5 4120 14 1716 416 0 7648 46 13 0 41
18 9 0 35 113 5 3933 103 1594 399 1 7714 47 13 0 40
19 8 0 34 83271 5 3702 12 1564 403 0 7010 45 14 0 41
20 7 0 28 27 6 3997 16 1624 400 0 7676 46 13 0 41
21 8 0 28 25 5 3997 15 1664 402 0 7658 47 12 0 41
22 4 0 97 7741 7731 1586 11 704 906 0 2933 17 51 0 32
23 13 0 28 25 5 4144 15 1658 437 0 7810 47 12 0 41
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 141 315 6 9262 301 2812 330 0 10905 49 16 0 35
1 1 0 153 199 6 9400 186 2808 312 0 11066 48 16 0 37
2 0 0 140 256 8 8798 242 2592 310 0 10111 47 15 0 38
3 1 0 141 189 6 8803 172 2592 314 0 10171 47 15 0 39
4 0 0 120 214 6 9540 207 2801 322 0 10531 46 17 0 36
5 1 0 152 180 6 8764 161 2564 342 0 9904 47 15 0 38
6 1 0 107 344 148 8180 181 2512 290 0 9314 51 14 0 35
7 0 0 665 443 204 8733 153 2574 404 0 9892 43 21 0 37
16 0 0 113 217 5 6446 201 1975 265 0 7552 45 12 0 44
17 0 0 107 153 5 6568 140 2021 274 0 7586 44 11 0 45
18 0 0 121 215 5 6072 201 1789 276 1 7690 44 12 0 44
19 1 0 102 47142 5 6123 126 1829 262 0 7185 43 12 0 45
20 0 0 102 143 6 6451 129 1939 262 0 7450 43 13 0 44
21 1 0 106 150 5 6538 133 1997 285 0 7425 44 11 0 44
22 0 0 494 5949 5876 3586 73 1040 399 0 4058 26 39 0 34
23 0 0 102 159 5 6393 142 1942 324 0 7226 43 12 0 46
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 217 441 7 10763 426 3234 363 0 12449 47 18
0 35
1 0 0 210 322 7 11113 309 3273 351 0 12527 46 17
0 37
2 1 0 212 387 8 10306 370 2977 354 0 11320 45 16
0 38
3 0 0 230 276 7 10332 257 2947 341 0 11901 43 16
0 40
4 0 0 234 306 7 11324 290 3265 352 0 12805 45 18
0 37
5 0 0 212 284 7 10590 262 3042 388 0 11789 44 17
0 39
6 1 0 154 307 48 9583 241 2903 324 0 10564 50 15 0 35
7 0 0 840 535 206 10354 247 3035 428 0 11700 42 22
0 37
16 0 0 169 303 5 7446 286 2250 290 0 8361 42 13 0 45
17 0 0 173 240 5 7640 225 2288 295 0 8674 41 13 0 47
18 0 0 170 289 5 7445 270 2108 286 0 8167 41 12 0 47
19 0 0 176 51118 5 7365 197 2138 288 0 7934 40 13 0 47
20 1 0 172 222 6 7835 204 2323 298 0 8759 40 14 0 46
21 0 0 167 233 5 7749 218 2339 326 0 8264 42 13 0 46
22 0 0 749 6612 6516 4173 97 1166 421 0 4741 23 44 0 33
23 0 0 181 239 6 7709 219 2258 383 0 8402 41 12 0 47
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 198 439 6 10364 417 3113 327 0 11962 49 17
0 34
1 0 0 210 299 6 10655 282 3135 346 0 12463 47 17
0 36
2 0 0 202 352 8 9960 332 2890 320 0 11261 47 16 0 37
3 0 0 182 276 6 9950 255 2857 334 0 11021 46 16 0 38
4 0 0 200 305 6 10841 286 3127 325 0 12440 48 18
0 35
5 0 0 240 286 6 9983 272 2912 358 0 11450 46 16 0 37
6 0 0 153 323 81 9062 233 2767 300 0 9675 49 18 0 33
7 0 0 850 556 206 10027 271 2910 415 0 11048 43 22
0 35
16 0 0 152 306 5 7261 291 2216 266 0 8055 44 12 0 44
17 0 0 151 236 5 7193 217 2170 283 0 8099 43 12 0 45
18 0 0 170 263 5 7008 246 2009 254 0 7836 43 12 0 46
19 0 0 165 47738 5 6824 197 1989 273 0 7663 42 12 0 46
20 0 0 188 217 6 7496 197 2222 280 0 8435 43 13 0 44
21 0 0 179 248 5 7352 234 2233 309 0 8237 43 12 0 44
22 0 0 813 6041 5963 4006 82 1125 448 0 4442 25 42 0 33
23 0 0 162 241 5 7364 225 2170 355 0 7720 43 11 0 45

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> What its saying is that there are holds/waits in trying to get locks
>> which are locked at Solaris user library levels called from the
>> postgresql functions:
>> For example both the following functions are hitting on the same mutex
>> lock 0x10059e280 in Solaris Library call:
>> postgres`AllocSetDelete+0x98
>> postgres`AllocSetAlloc+0x1c4
>>
>
> That's a perfect example of the sort of useless overhead that I was
> complaining of just now in pgsql-patches. Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded. And yet, apparently you can't turn
> that off in Solaris.
>
> (Fortunately, the palloc layer is probably insulating us from malloc's
> performance enough that this isn't a huge deal. But it's annoying.)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

From:	David Boreham <david_list(at)boreham(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-21 03:31:14
Message-ID:	46A17E02.2090807@boreham.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded.
>
Hmm...confused. I'm not following why then there is contention for the
mutex.
Surely this has to be some other mutex that is in contention, not a heap
lock ?

It'd be handy to see the call stack for the wait state -- if the thing
is spending
a significant proportion of its time in contention it should be easy to
get that with
a simple tool such as pstack or a debugger.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Boreham <david_list(at)boreham(dot)org>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-21 04:26:10
Message-ID:	20297.1184991970@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

David Boreham <david_list(at)boreham(dot)org> writes:
> Tom Lane wrote:
>> Having malloc/free use
>> an internal mutex is necessary in multi-threaded programs, but the
>> backend isn't multi-threaded.

> Hmm...confused. I'm not following why then there is contention for the
> mutex.

There isn't any contention for that mutex; Jignesh's results merely show
that it was taken and released a lot of times.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 08:51:08
Message-ID:	1185180668.4284.149.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Fri, 2007-07-20 at 16:57 -0400, Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> > What its saying is that there are holds/waits in trying to get locks
> > which are locked at Solaris user library levels called from the
> > postgresql functions:
> > For example both the following functions are hitting on the same mutex
> > lock 0x10059e280 in Solaris Library call:
> > postgres`AllocSetDelete+0x98
> > postgres`AllocSetAlloc+0x1c4
>
> That's a perfect example of the sort of useless overhead that I was
> complaining of just now in pgsql-patches. Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded. And yet, apparently you can't turn
> that off in Solaris.
>
> (Fortunately, the palloc layer is probably insulating us from malloc's
> performance enough that this isn't a huge deal. But it's annoying.)

There is one thing that the palloc layer doesn't handle: EState. All
other memory contexts have a very well chosen initial allocation that
prevents mallocs during low-medium complexity OLTP workloads.

EState is about 8300 bytes, so just above the large allocation limit.
This means that every time we request an EState, i.e. at least once per
statement we need to malloc() and then later free().

Would it be worth a special case in the palloc system to avoid having to
repeatedly issue external memory allocation calls?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 14:11:41
Message-ID:	23848.1185199901@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> EState is about 8300 bytes,

What?

(gdb) p sizeof(EState)
$1 = 112

This is on a 32-bit machine, but even on 64-bit it wouldn't be more than
double that.

> Would it be worth a special case in the palloc system to avoid having to
> repeatedly issue external memory allocation calls?

The appropriate hack would be to change the AllocSetContextCreate
initial-size parameter for the containing context. But I really have
no idea what you're on about.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 14:39:16
Message-ID:	1185201556.4284.243.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2007-07-23 at 10:11 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > EState is about 8300 bytes,
>
> What?
>
> (gdb) p sizeof(EState)
> $1 = 112
>
> This is on a 32-bit machine, but even on 64-bit it wouldn't be more than
> double that.
>
> > Would it be worth a special case in the palloc system to avoid having to
> > repeatedly issue external memory allocation calls?
>
> The appropriate hack would be to change the AllocSetContextCreate
> initial-size parameter for the containing context. But I really have
> no idea what you're on about.

I looked at this last May and my notes say "ExecutorState". I guess that
was wrong, but my analysis showed there was a single malloc of 8228
bytes happening once per query during my tests.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 14:54:58
Message-ID:	25095.1185202498@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> I looked at this last May and my notes say "ExecutorState". I guess that
> was wrong, but my analysis showed there was a single malloc of 8228
> bytes happening once per query during my tests.

Well, if you can track down where it's coming from, we could certainly
hack the containing context's parameters. But EState's not it.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 15:47:43
Message-ID:	1185205663.4284.249.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > I looked at this last May and my notes say "ExecutorState". I guess that
> > was wrong, but my analysis showed there was a single malloc of 8228
> > bytes happening once per query during my tests.
>
> Well, if you can track down where it's coming from, we could certainly
> hack the containing context's parameters. But EState's not it.

Well, I discover there is an allocation of 8232 (inflation...) made once
per statement by a memory context called... ExecutorState. Still not
sure exactly which allocation this is, but its definitely once per
statement on pgbench, which should narrow it down. Plan, query etc?

I don't see a way to hack the allocation, since the max chunk size is
8K.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 16:35:28
Message-ID:	27521.1185208528@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> Well, I discover there is an allocation of 8232 (inflation...) made once
> per statement by a memory context called... ExecutorState. Still not
> sure exactly which allocation this is, but its definitely once per
> statement on pgbench, which should narrow it down. Plan, query etc?

Are you working with stock sources? The only allocation exceeding 1K
that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes.
(Checked by setting a conditional breakpoint on AllocSetAlloc.) The
path that allocates a single-chunk block is never taken at all.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 17:37:41
Message-ID:	1185212261.4284.270.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2007-07-23 at 16:48 +0100, Simon Riggs wrote:
> On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > I looked at this last May and my notes say "ExecutorState". I guess that
> > > was wrong, but my analysis showed there was a single malloc of 8228
> > > bytes happening once per query during my tests.
> >
> > Well, if you can track down where it's coming from, we could certainly
> > hack the containing context's parameters. But EState's not it.
>
> Well, I discover there is an allocation of 8232 (inflation...) made once
> per statement by a memory context called... ExecutorState. Still not
> sure exactly which allocation this is, but its definitely once per
> statement on pgbench, which should narrow it down. Plan, query etc?
>
> I don't see a way to hack the allocation, since the max chunk size is
> 8K.

It is the allocation of BTScanOpaqueData called from btrescan() in
nbtree.c

currPos and markPos are defined as BTScanPosData, which is an array of
BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
seems wasteful since markPos is only ever used during merge joins. Most
of that space isn't even used during merge joins either, we just do that
to slightly optimise the speed of the restore during merge joins.

Seems like we should allocate the memory when we do the first mark.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 17:50:53
Message-ID:	1185213053.4284.278.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2007-07-23 at 12:35 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > Well, I discover there is an allocation of 8232 (inflation...) made once
> > per statement by a memory context called... ExecutorState. Still not
> > sure exactly which allocation this is, but its definitely once per
> > statement on pgbench, which should narrow it down. Plan, query etc?
>
> Are you working with stock sources? The only allocation exceeding 1K
> that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes.
> (Checked by setting a conditional breakpoint on AllocSetAlloc.) The
> path that allocates a single-chunk block is never taken at all.

I do have the bitmap patch currently applied, but it doesn't touch that
part of the code.

(gdb) p size
$1 = 8232

(gdb) p sizeof(int)
$2 = 4

(gdb) p sizeof(BTScanPosData)
$3 = 4104

Since my notes say I got 8228 last year, seems reasonable.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 18:19:20
Message-ID:	1545.1185214760@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> currPos and markPos are defined as BTScanPosData, which is an array of
> BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
> seems wasteful since markPos is only ever used during merge joins. Most
> of that space isn't even used during merge joins either, we just do that
> to slightly optimise the speed of the restore during merge joins.

Ah. I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but
I forgot that both of those architectures have MAXALIGN = 8. On a
MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly
larger, leading to larger BTScanPosData.

Not sure it's worth fooling with, given that these days almost everyone
who's seriously concerned about performance is probably using 64bit
hardware. One less malloc cycle per indexscan is never going to be a
measurable savings anyway...

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-23 18:30:48
Message-ID:	1185215448.4284.295.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2007-07-23 at 14:19 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > currPos and markPos are defined as BTScanPosData, which is an array of
> > BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
> > seems wasteful since markPos is only ever used during merge joins. Most
> > of that space isn't even used during merge joins either, we just do that
> > to slightly optimise the speed of the restore during merge joins.
>
> Ah. I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but
> I forgot that both of those architectures have MAXALIGN = 8. On a
> MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly
> larger, leading to larger BTScanPosData.
>
> Not sure it's worth fooling with, given that these days almost everyone
> who's seriously concerned about performance is probably using 64bit
> hardware. One less malloc cycle per indexscan is never going to be a
> measurable savings anyway...

Oh sure, I was thinking to avoid Solaris' mutex by avoiding malloc()
completely.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-25 22:58:53
Message-ID:	46A7D5AD.9000706@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Here is how I got the numbers..
I had about 1600 users login into postgresql. Then started the run with
500 users and using DTrace I started tracking Postgresql Locking "as
viewed from one user/connection". Echo statements indicate how many
users were active at that point and how was throughput performing. All
IO is done on /tmp which means on a RAM disk.

bash-3.00# echo 500 users - baseline number
500 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
FirstLockMgrLock Exclusive 1
RelCacheInitLock Exclusive 2
SInvalLock Exclusive 2
WALInsertLock Exclusive 10
BufMappingLock Exclusive 12
CheckpointLock Shared 29
CheckpointStartLock Shared 29
OidGenLock Exclusive 29
XidGenLock Exclusive 29
FirstLockMgrLock Shared 33
CheckpointStartLock Exclusive 78
FreeSpaceLock Exclusive 114
OidGenLock Shared 126
XidGenLock Shared 152
ProcArrayLock Shared 482

Lock Id Combined Time (ns)
SInvalLock 29800
RelCacheInitLock 30300
BufMappingLock 168800
FirstLockMgrLock 414300
FreeSpaceLock 1281700
ProcArrayLock 7869900
WALInsertLock 11113200
CheckpointStartLock 13494700
OidGenLock 25719100
XidGenLock 26443300
CheckpointLock 194267800

bash-3.00# echo 600 users - Throughput rising
600 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
RelCacheInitLock Exclusive 1
SInvalLock Exclusive 1
BufMappingLock Exclusive 2
CLogControlLock Exclusive 2
WALInsertLock Exclusive 11
FirstLockMgrLock Shared 20
CheckpointLock Shared 24
CheckpointStartLock Shared 24
OidGenLock Exclusive 24
XidGenLock Exclusive 24
CheckpointStartLock Exclusive 72
FreeSpaceLock Exclusive 102
OidGenLock Shared 106
XidGenLock Shared 128
ProcArrayLock Shared 394

Lock Id Combined Time (ns)
SInvalLock 15600
RelCacheInitLock 15700
BufMappingLock 31000
CLogControlLock 41500
FirstLockMgrLock 289000
FreeSpaceLock 3045400
CheckpointStartLock 7371800
WALInsertLock 9383200
ProcArrayLock 10457900
OidGenLock 20005900
XidGenLock 20331500
CheckpointLock 187067900

bash-3.00# echo 700 users - Throughput rising
700 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
RelCacheInitLock Exclusive 1
SInvalLock Exclusive 1
BufMappingLock Exclusive 2
WALInsertLock Exclusive 17
CheckpointLock Shared 33
CheckpointStartLock Shared 33
OidGenLock Exclusive 33
XidGenLock Exclusive 33
FirstLockMgrLock Shared 81
CheckpointStartLock Exclusive 87
FreeSpaceLock Exclusive 124
OidGenLock Shared 125
XidGenLock Shared 150
ProcArrayLock Shared 500

Lock Id Combined Time (ns)
RelCacheInitLock 15100
SInvalLock 15400
BufMappingLock 47400
FirstLockMgrLock 3021000
FreeSpaceLock 3794300
WALInsertLock 7567300
XidGenLock 18427400
ProcArrayLock 20884000
CheckpointStartLock 24084900
OidGenLock 26399500
CheckpointLock 256549800

bash-3.00# echo 800 users - Throughput rising
800 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
BufMappingLock Exclusive 1
RelCacheInitLock Exclusive 1
SInvalLock Exclusive 1
WALWriteLock Exclusive 1
WALInsertLock Exclusive 11
CheckpointLock Shared 27
CheckpointStartLock Shared 27
OidGenLock Exclusive 27
XidGenLock Exclusive 27
FirstLockMgrLock Shared 32
CheckpointStartLock Exclusive 73
FreeSpaceLock Exclusive 110
OidGenLock Shared 118
XidGenLock Shared 140
ProcArrayLock Shared 442

Lock Id Combined Time (ns)
WALWriteLock 13900
SInvalLock 15000
RelCacheInitLock 15500
BufMappingLock 18600
FirstLockMgrLock 391100
WALInsertLock 3953700
FreeSpaceLock 4801300
CheckpointStartLock 13131800
ProcArrayLock 14480500
OidGenLock 17736500
XidGenLock 21723100
CheckpointLock 206423500

bash-3.00# echo 850 users - SLIGHT DROP in throughput
850 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
FirstLockMgrLock Exclusive 1
RelCacheInitLock Exclusive 1
SInvalLock Exclusive 1
WALWriteLock Exclusive 1
BufMappingLock Exclusive 3
WALInsertLock Exclusive 7
CheckpointLock Shared 39
CheckpointStartLock Shared 39
OidGenLock Exclusive 39
XidGenLock Exclusive 39
FirstLockMgrLock Shared 47
CheckpointStartLock Exclusive 113
FreeSpaceLock Exclusive 152
OidGenLock Shared 162
XidGenLock Shared 194
ProcArrayLock Shared 621

Lock Id Combined Time (ns)
WALWriteLock 14200
RelCacheInitLock 15100
SInvalLock 15600
BufMappingLock 64100
WALInsertLock 2073200
FirstLockMgrLock 3040300
FreeSpaceLock 7329500
OidGenLock 21619100
CheckpointStartLock 23261300
ProcArrayLock 23917500
XidGenLock 24873100
CheckpointLock 309221200

bash-3.00# echo 900 users - ANOTHER SLIGHT DROP IN THROUGPUT
900 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
WALWriteLock Exclusive 1
WALInsertLock Exclusive 7
CheckpointStartLock Shared 13
OidGenLock Exclusive 13
CheckpointLock Shared 14
XidGenLock Exclusive 14
FirstLockMgrLock Shared 15
FreeSpaceLock Exclusive 51
OidGenLock Shared 51
XidGenLock Shared 62
CheckpointStartLock Exclusive 170
ProcArrayLock Shared 202

Lock Id Combined Time (ns)
WALWriteLock 16800
FirstLockMgrLock 170300
FreeSpaceLock 601500
ProcArrayLock 3971300
WALInsertLock 7757200
OidGenLock 8261900
XidGenLock 18450900
CheckpointStartLock 39155100
CheckpointLock 143751500

bash-3.00# echo 950 users - BIG DROP IN THROUGHPUT
950 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
WALInsertLock Exclusive 3
FirstLockMgrLock Shared 4
CheckpointLock Shared 7
CheckpointStartLock Shared 7
OidGenLock Exclusive 7
XidGenLock Exclusive 7
FreeSpaceLock Exclusive 29
OidGenLock Shared 30
XidGenLock Shared 36
ProcArrayLock Shared 115
CheckpointStartLock Exclusive 134

Lock Id Combined Time (ns)
FirstLockMgrLock 64400
FreeSpaceLock 342300
WALInsertLock 1759600
OidGenLock 4276900
ProcArrayLock 6234300
XidGenLock 6865000
CheckpointStartLock 37590800
CheckpointLock 58994300

bash-3.00# echo 1000 users - STEADY AT PREVIOUS LOW VALUE
1000 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
BufMappingLock Exclusive 1
RelCacheInitLock Exclusive 1
SInvalLock Exclusive 1
WALInsertLock Exclusive 3
CheckpointLock Shared 9
CheckpointStartLock Shared 9
OidGenLock Exclusive 9
XidGenLock Exclusive 9
FirstLockMgrLock Shared 14
FreeSpaceLock Exclusive 33
OidGenLock Shared 37
XidGenLock Shared 44
CheckpointStartLock Exclusive 122
ProcArrayLock Shared 145

Lock Id Combined Time (ns)
RelCacheInitLock 14300
SInvalLock 15600
BufMappingLock 21400
FirstLockMgrLock 184000
FreeSpaceLock 366200
WALInsertLock 1769500
ProcArrayLock 5076500
XidGenLock 5898400
OidGenLock 9244800
CheckpointStartLock 31077500
CheckpointLock 91861900

bash-3.00# echo 1050 users - SMALL INCREASE
1050 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
BufMappingLock Exclusive 2
WALInsertLock Exclusive 9
CheckpointLock Shared 24
XidGenLock Exclusive 24
CheckpointStartLock Shared 25
OidGenLock Exclusive 25
FirstLockMgrLock Shared 30
FreeSpaceLock Exclusive 100
OidGenLock Shared 107
XidGenLock Shared 129
CheckpointStartLock Exclusive 153
ProcArrayLock Shared 400

Lock Id Combined Time (ns)
BufMappingLock 36600
FirstLockMgrLock 420600
FreeSpaceLock 2998400
WALInsertLock 3818300
ProcArrayLock 8986900
OidGenLock 18127200
XidGenLock 18569200
CheckpointStartLock 44795700
CheckpointLock 206488400

bash-3.00# echo 1100 users - SMALL DROP AGAIN
1100 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
BufMappingLock Exclusive 1
WALInsertLock Exclusive 6
CheckpointLock Shared 11
XidGenLock Exclusive 11
CheckpointStartLock Shared 12
OidGenLock Exclusive 12
FirstLockMgrLock Shared 24
FreeSpaceLock Exclusive 39
OidGenLock Shared 44
XidGenLock Shared 51
CheckpointStartLock Exclusive 88
ProcArrayLock Shared 171

Lock Id Combined Time (ns)
BufMappingLock 19500
FirstLockMgrLock 302700
FreeSpaceLock 511200
ProcArrayLock 5042300
WALInsertLock 5592800
CheckpointStartLock 25009900
OidGenLock 25231600
XidGenLock 108045300
CheckpointLock 379734000

bash-3.00# echo 1150 users - STEADY AT LOW VALUE
1150 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
WALWriteLock Exclusive 1
WALInsertLock Exclusive 2
CheckpointLock Shared 5
CheckpointStartLock Shared 6
OidGenLock Exclusive 6
XidGenLock Exclusive 6
FirstLockMgrLock Shared 8
FreeSpaceLock Exclusive 21
OidGenLock Shared 26
XidGenLock Shared 31
ProcArrayLock Shared 93
CheckpointStartLock Exclusive 122

Lock Id Combined Time (ns)
WALWriteLock 14900
WALInsertLock 116900
FirstLockMgrLock 120600
FreeSpaceLock 2177800
XidGenLock 4899200
ProcArrayLock 20721700
CheckpointStartLock 27805200
CheckpointLock 76369300
OidGenLock 470145800

bash-3.00# echo 1250 users - STEADY AT LOW VALUE
1250 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock Id Mode Count
CheckpointLock Shared 2
CheckpointStartLock Shared 2
OidGenLock Exclusive 2
WALInsertLock Exclusive 2
XidGenLock Exclusive 2
FreeSpaceLock Exclusive 9
OidGenLock Shared 10
XidGenLock Shared 12
ProcArrayLock Shared 36
CheckpointStartLock Exclusive 135

Lock Id Combined Time (ns)
WALInsertLock 39500
FreeSpaceLock 98600
ProcArrayLock 318800
XidGenLock 1379900
OidGenLock 3437700
CheckpointLock 9565200
CheckpointStartLock 56547900

bash-3.00#

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-25 23:24:25
Message-ID:	13162.1185405865@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> Here is how I got the numbers..
> I had about 1600 users login into postgresql. Then started the run with
> 500 users and using DTrace I started tracking Postgresql Locking "as
> viewed from one user/connection". Echo statements indicate how many
> users were active at that point and how was throughput performing. All
> IO is done on /tmp which means on a RAM disk.

> bash-3.00# echo 500 users - baseline number
> 500 users
> bash-3.00# ./3_lwlock_acquires.d 19178

> Lock Id Mode Count
> FirstLockMgrLock Exclusive 1
> RelCacheInitLock Exclusive 2
> SInvalLock Exclusive 2
> WALInsertLock Exclusive 10
> BufMappingLock Exclusive 12
> CheckpointLock Shared 29
> CheckpointStartLock Shared 29
> OidGenLock Exclusive 29
> XidGenLock Exclusive 29
> FirstLockMgrLock Shared 33
> CheckpointStartLock Exclusive 78
> FreeSpaceLock Exclusive 114
> OidGenLock Shared 126
> XidGenLock Shared 152
> ProcArrayLock Shared 482

I don't think I believe these numbers. For one thing, CheckpointLock
is simply not ever taken in shared mode. The ratios of counts for
different locks seems pretty improbable too, eg there is no way on
earth that the LockMgrLocks are taken more often shared than
exclusive (I would expect no shared acquires at all in the sort of
test you are running). Not to mention that the absolute number of
counts seems way too low. So I think the counting tool is broken.

> Lock Id Combined Time (ns)
> SInvalLock 29800
> RelCacheInitLock 30300
> BufMappingLock 168800
> FirstLockMgrLock 414300
> FreeSpaceLock 1281700
> ProcArrayLock 7869900
> WALInsertLock 11113200
> CheckpointStartLock 13494700
> OidGenLock 25719100
> XidGenLock 26443300
> CheckpointLock 194267800

Combined time of what exactly? It looks like this must be the total
duration the lock is held, at least assuming that the time for
CheckpointLock is correctly reported. It'd be much more useful to see
the total time spent waiting to acquire the lock.

regards, tom lane

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 14:29:51
Message-ID:	46A8AFDF.3050804@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

The count is only for a 10-second snapshot.. Plus remember there are
about 1000 users running so the connection being profiled only gets
0.01 of the period on CPU.. And the count is for that CONNECTION only.

Anyway using the lock wait script it shows the real picture as you
requested. Here the combined time means time "spent waiting" for the lock.

bash-3.00# echo 500 users
500 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Exclusive 1
CheckpointStartLock Shared 3
OidGenLock Shared 4
WALInsertLock Exclusive 7
FreeSpaceLock Exclusive 8
XidGenLock Exclusive 15
CheckpointStartLock Exclusive 16

Lock Id Combined Time (ns)
XidGenLock 3825000
OidGenLock 5307100
WALInsertLock 6317800
FreeSpaceLock 7244100
CheckpointStartLock 22199200

bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Exclusive 1
WALInsertLock Exclusive 1
CheckpointStartLock Shared 4
CheckpointStartLock Exclusive 11
XidGenLock Exclusive 21

Lock Id Combined Time (ns)
OidGenLock 1728200
WALInsertLock 2040700
XidGenLock 19878500
CheckpointStartLock 24156500

bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Shared 1
XidGenLock Shared 1
CheckpointStartLock Shared 2
WALInsertLock Exclusive 4
CheckpointStartLock Exclusive 6
FreeSpaceLock Exclusive 6
XidGenLock Exclusive 13

Lock Id Combined Time (ns)
OidGenLock 1730000
WALInsertLock 7253400
FreeSpaceLock 10977700
CheckpointStartLock 13356800
XidGenLock 38220500

bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
CheckpointStartLock Shared 1
WALInsertLock Exclusive 2
XidGenLock Shared 2
CheckpointStartLock Exclusive 5
FreeSpaceLock Exclusive 8
XidGenLock Exclusive 12

Lock Id Combined Time (ns)
WALInsertLock 3746800
FreeSpaceLock 7628900
CheckpointStartLock 11297500
XidGenLock 16649000

bash-3.00# echo 900 users - BIG DROP IN THROUGHPUT OCCURS...
900 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Exclusive 1
OidGenLock Shared 1
XidGenLock Shared 1
FreeSpaceLock Exclusive 2
WALInsertLock Exclusive 2
CheckpointStartLock Shared 6
XidGenLock Exclusive 12
CheckpointStartLock Exclusive 121

Lock Id Combined Time (ns)
OidGenLock 1968100
FreeSpaceLock 2076300
WALInsertLock 2190400
XidGenLock 20259400
CheckpointStartLock 1407294300

bash-3.00# echo 950 users
950 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Exclusive 1
OidGenLock Shared 2
CheckpointStartLock Shared 3
WALInsertLock Exclusive 4
FreeSpaceLock Exclusive 5
XidGenLock Exclusive 11
CheckpointStartLock Exclusive 50

Lock Id Combined Time (ns)
WALInsertLock 5577100
FreeSpaceLock 9115900
XidGenLock 13765800
OidGenLock 50155500
CheckpointStartLock 759872200

bash-3.00# echo 1000 users
1000 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
OidGenLock Shared 1
WALInsertLock Exclusive 1
XidGenLock Exclusive 5
CheckpointStartLock Shared 6
CheckpointStartLock Exclusive 102

Lock Id Combined Time (ns)
OidGenLock 21900
WALInsertLock 82500
XidGenLock 3313400
CheckpointStartLock 1448289900

bash-3.00# echo 1050 users
1050 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock Id Mode Count
FreeSpaceLock Exclusive 1
CheckpointStartLock Shared 3
XidGenLock Exclusive 3
CheckpointStartLock Exclusive 146

Lock Id Combined Time (ns)
FreeSpaceLock 18400
XidGenLock 1900900
CheckpointStartLock 2392893700

bash-3.00#

-Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> Here is how I got the numbers..
>> I had about 1600 users login into postgresql. Then started the run with
>> 500 users and using DTrace I started tracking Postgresql Locking "as
>> viewed from one user/connection". Echo statements indicate how many
>> users were active at that point and how was throughput performing. All
>> IO is done on /tmp which means on a RAM disk.
>>
>
>
>> bash-3.00# echo 500 users - baseline number
>> 500 users
>> bash-3.00# ./3_lwlock_acquires.d 19178
>>
> I don't think I believe these numbers. For one thing, CheckpointLock
> is simply not ever taken in shared mode. The ratios of counts for
> different locks seems pretty improbable too, eg there is no way on
> earth that the LockMgrLocks are taken more often shared than
> exclusive (I would expect no shared acquires at all in the sort of
> test you are running). Not to mention that the absolute number of
> counts seems way too low. So I think the counting tool is broken.
>
>
> Combined time of what exactly? It looks like this must be the total
> duration the lock is held, at least assuming that the time for
> CheckpointLock is correctly reported. It'd be much more useful to see
> the total time spent waiting to acquire the lock.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 14:40:16
Message-ID:	26825.1185460816@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> The count is only for a 10-second snapshot.. Plus remember there are
> about 1000 users running so the connection being profiled only gets
> 0.01 of the period on CPU.. And the count is for that CONNECTION only.

OK, that explains the low absolute levels of the counts, but if the
counts are for a regular backend then there's still a lot of bogosity
here. Backends won't be taking the CheckpointLock at all, nor do they
take CheckpointStartLock in exclusive mode. The bgwriter would do that
but it'd not be taking most of these other locks. So I think the script
is mislabeling the locks somehow.

Also, elementary statistics should tell you that a sample taken as above
is going to have enormous amounts of noise. You should be sampling over
a much longer period, say on the order of a minute of runtime, to have
numbers that are trustworthy.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 15:25:31
Message-ID:	1185463531.4190.72.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 2007-07-26 at 10:29 -0400, Jignesh K. Shah wrote:
> The count is only for a 10-second snapshot.. Plus remember there are
> about 1000 users running so the connection being profiled only gets
> 0.01 of the period on CPU.. And the count is for that CONNECTION only.

Is that for one process, or all processes aggregated in some way?

> CheckpointStartLock Shared 6
> CheckpointStartLock Exclusive 102

That's definitely whacked. Surely we didn't start 102 checkpoints yet
attempt to commit 6 times?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 15:27:30
Message-ID:	46A8BD62.2080904@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..

For 600-850 Users: that potential mislabeled CheckPointStartLock or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage_ReadOnly+0xc
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
8202100

postgres`LWLockAcquire+0x1c8
postgres`TransactionIdSetStatus+0x1c
postgres`RecordTransactionCommit+0x2a8
postgres`CommitTransaction+0xc8
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
30822100

However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesUpdate+0x360
postgres`heap_lock_tuple+0x27c
postgres`ExecutePlan+0x33c
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
444523100

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
1661300000

Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..

-Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> The count is only for a 10-second snapshot.. Plus remember there are
>> about 1000 users running so the connection being profiled only gets
>> 0.01 of the period on CPU.. And the count is for that CONNECTION only.
>>
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here. Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode. The bgwriter would do that
> but it'd not be taking most of these other locks. So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise. You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
> regards, tom lane
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 15:42:48
Message-ID:	27577.1185464568@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
> For 600-850 Users: that potential mislabeled CheckPointStartLock or
> LockID==12 comes from various sources where the top source (while system
> is still doing great) comes from:

> postgres`LWLockAcquire+0x1c8
> postgres`SimpleLruReadPage_ReadOnly+0xc
> postgres`TransactionIdGetStatus+0x14

That path would be taking CLogControlLock ... so you're off by at least
one. Compare the script to src/include/storage/lwlock.h.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 15:51:16
Message-ID:	1185465076.4190.87.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:

> However at 900 Users where the big drop in throughput occurs:
> It gives a different top "consumer" of time:

postgres`LWLockAcquire+0x1c8
> postgres`SimpleLruReadPage+0x1ac
> postgres`TransactionIdGetStatus+0x14
> postgres`TransactionLogFetch+0x58

TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??

Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.

Try the enclosed patch.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
clog_buffers.v1.test.patch	text/x-patch	702 bytes

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 19:44:42
Message-ID:	46A8F9AA.4040507@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

BEAUTIFUL!!!

Using the Patch I can now go upto 1300 users without dropping.. But now
it still repeats at 1300-1350 users..

I corrected the Lock Descriptions based on what I got from lwlock.h and
retried the whole scalability again with profiling again.. This time it
looks like the ProcArrayLock

bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
XidGenLock Exclusive 1
CLogControlLock Shared 2
XidGenLock Shared 2
WALWriteLock Exclusive 4
WALInsertLock Exclusive 8
CLogControlLock Exclusive 9
ProcArrayLock Exclusive 9

Lock Id Combined Time (ns)
WALWriteLock 2842300
XidGenLock 4951000
CLogControlLock 11151800
WALInsertLock 13035600
ProcArrayLock 20040000

bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 8
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 22

Lock Id Combined Time (ns)
XidGenLock 4093300
WALWriteLock 4914800
WALInsertLock 7389100
ProcArrayLock 10248200
CLogControlLock 11989400

bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
WALWriteLock Exclusive 1
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 7
WALInsertLock Exclusive 7
ProcArrayLock Exclusive 31

Lock Id Combined Time (ns)
WALWriteLock 319100
XidGenLock 5388700
WALInsertLock 9901400
CLogControlLock 13465000
ProcArrayLock 42979700

bash-3.00# echo 900 users
900 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
CLogControlLock Shared 1
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
CLogControlLock Exclusive 6
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 25

Lock Id Combined Time (ns)
XidGenLock 3197700
WALWriteLock 3887100
CLogControlLock 15774500
WALInsertLock 38268700
ProcArrayLock 162073100

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Shared 1
ProcArrayLock Exclusive 67

Lock Id Combined Time (ns)
ProcArrayLock 216773800

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalStart+0x150
postgres`exec_bind_message+0x81c
postgres`PostgresMain+0x12b8
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
2779000

postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
213994800

bash-3.00# echo 1000 users - SLIGHT DROP
1000 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
WALWriteLock Exclusive 1
CLogControlLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
WALInsertLock Exclusive 4
ProcArrayLock Exclusive 26

Lock Id Combined Time (ns)
WALWriteLock 1807400
XidGenLock 2024000
WALInsertLock 2177500
CLogControlLock 9064700
ProcArrayLock 199216000

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Shared 3
ProcArrayLock Exclusive 38

Lock Id Combined Time (ns)
ProcArrayLock 858238600

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1c8
postgres`TransactionIdIsInProgress+0x50
postgres`HeapTupleSatisfiesVacuum+0x2ec
postgres`_bt_check_unique+0x2a0
postgres`_bt_doinsert+0x98
postgres`btinsert+0x54
postgres`FunctionCall6+0x44
postgres`index_insert+0x90
postgres`ExecInsertIndexTuples+0x1bc
postgres`ExecUpdate+0x500
postgres`ExecutePlan+0x704
postgres`ExecutorRun+0x60
postgres`PortalRunMulti+0x2a0
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
167100

postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalRunMulti+0x22c
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
7125900

bash-3.00# echo 1100 users - DROP ....
1100 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
CLogControlLock Shared 1
WALWriteLock Exclusive 1
XidGenLock Exclusive 1
ProcArrayLock Shared 2
WALInsertLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Exclusive 4
ProcArrayLock Exclusive 20

Lock Id Combined Time (ns)
WALWriteLock 4179500
XidGenLock 6249400
CLogControlLock 20411100
WALInsertLock 29707600
ProcArrayLock 207923700

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Exclusive 40

Lock Id Combined Time (ns)
ProcArrayLock 692242100

Lock Id Combined Time (ns)

bash-3.00#

LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to
indicate it is lwlock.h
Any tweaks for that?

-Jignesh

Simon Riggs wrote:
> On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
>
>
>> However at 900 Users where the big drop in throughput occurs:
>> It gives a different top "consumer" of time:
>>
>
>
> postgres`LWLockAcquire+0x1c8
>
>> postgres`SimpleLruReadPage+0x1ac
>> postgres`TransactionIdGetStatus+0x14
>> postgres`TransactionLogFetch+0x58
>>
>
> TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
> Presumably the compiler has been rearranging things??
>
> Looks like you're out of clog buffers. It seems like the clog buffers
> aren't big enough to hold clog pages for long enough and the SELECT FOR
> SHARE processing is leaving lots of additional read locks that are
> increasing the number of clog requests for older xids.
>
> Try the enclosed patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/include/access/clog.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
> retrieving revision 1.19
> diff -c -r1.19 clog.h
> *** src/include/access/clog.h 5 Jan 2007 22:19:50 -0000 1.19
> --- src/include/access/clog.h 26 Jul 2007 15:44:58 -0000
> ***************
> *** 29,35 ****
>
>
> /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS 8
>
>
> extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
> --- 29,35 ----
>
>
> /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS 64
>
>
> extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

From:	Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 19:56:08
Message-ID:	46A8FC58.7060301@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
>
> That path would be taking CLogControlLock ... so you're off by at least
> one. Compare the script to src/include/storage/lwlock.h.
>

Indeed, the indexing was off by one due to the removal of
BuffMappingLock in src/include/storage/lwlock.h between 8.1 and 8.2
which was not updated in the DTrace script.

Thanks,
Robert

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 20:04:50
Message-ID:	1185480291.5510.22.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
> BEAUTIFUL!!!
>
> Using the Patch I can now go upto 1300 users without dropping.. But now
> it still repeats at 1300-1350 users..

OK, can you try again with 16 and 32 buffers please? We need to know
how many is enough and whether this number needs to be variable via a
parameter, or just slightly larger by default. Thanks.

> I corrected the Lock Descriptions based on what I got from lwlock.h and
> retried the whole scalability again with profiling again.. This time it
> looks like the ProcArrayLock

That's what I would expect with that many users.

> Lock Id Mode Count
> XidGenLock Exclusive 1
> CLogControlLock Shared 2
> XidGenLock Shared 2
> WALWriteLock Exclusive 4
> WALInsertLock Exclusive 8
> CLogControlLock Exclusive 9
> ProcArrayLock Exclusive 9

...but as Tom mentioned, we need to do longer runs now so these counts
get to somewhere in the hundreds so we have some statistical validity.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-26 21:17:55
Message-ID:	46A90F83.8050401@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Will try 16 and 32 CLOGBUFFER tomorrow:

But here is locks data again with about increased time profiling (about
2 minutes) for the connection with about 2000 users:

bash-3.00# time ./4_lwlock_waits.d 13583
^C

Lock Id Mode Count
ProcArrayLock Shared 5
XidGenLock Exclusive 13
CLogControlLock Shared 14
XidGenLock Shared 15
CLogControlLock Exclusive 21
WALInsertLock Exclusive 77
WALWriteLock Exclusive 175
ProcArrayLock Exclusive 275

Lock Id Combined Time (ns)
XidGenLock 194966200
WALInsertLock 517955000
CLogControlLock 679665100
WALWriteLock 2838716200
ProcArrayLock 44181002600

Top Wait time seems to come from the following code path for
ProcArrayLock:

Lock Id Mode Count
ProcArrayLock Exclusive 21

Lock Id Combined Time (ns)
ProcArrayLock 5255937500

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1f0
postgres`CommitTransaction+0x104
postgres`CommitTransactionCommand+0xbc
postgres`finish_xact_command+0x78
postgres`exec_execute_message+0x42c
postgres`PostgresMain+0x1838
postgres`BackendRun+0x2f8
postgres`ServerLoop+0x680
postgres`PostmasterMain+0xda8
postgres`main+0x3d0
postgres`_start+0x17c
5255937500

Regards,
Jignesh

Simon Riggs wrote:
> On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
>
>> BEAUTIFUL!!!
>>
>> Using the Patch I can now go upto 1300 users without dropping.. But now
>> it still repeats at 1300-1350 users..
>>
>
> OK, can you try again with 16 and 32 buffers please? We need to know
> how many is enough and whether this number needs to be variable via a
> parameter, or just slightly larger by default. Thanks.
>
>
>> I corrected the Lock Descriptions based on what I got from lwlock.h and
>> retried the whole scalability again with profiling again.. This time it
>> looks like the ProcArrayLock
>>
>
> That's what I would expect with that many users.
>
>
>> Lock Id Mode Count
>> XidGenLock Exclusive 1
>> CLogControlLock Shared 2
>> XidGenLock Shared 2
>> WALWriteLock Exclusive 4
>> WALInsertLock Exclusive 8
>> CLogControlLock Exclusive 9
>> ProcArrayLock Exclusive 9
>>
>
> ...but as Tom mentioned, we need to do longer runs now so these counts
> get to somewhere in the hundreds so we have some statistical validity.
>
>

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 08:58:05
Message-ID:	20070727085805.GF2550@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Jignesh K. Shah wrote:

> Top Wait time seems to come from the following code path for
> ProcArrayLock:
>
> Lock Id Mode Count
> ProcArrayLock Exclusive 21
>
> Lock Id Combined Time (ns)
> ProcArrayLock 5255937500
>
> Lock Id Combined Time (ns)
>
>
> postgres`LWLockAcquire+0x1f0
> postgres`CommitTransaction+0x104

Yeah, ProcArrayLock is pretty contended. I think it would be kinda neat
if we could split it up in partitions. This lock is quite particular
though.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 09:15:13
Message-ID:	1185527713.4191.20.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Fri, 2007-07-27 at 04:58 -0400, Alvaro Herrera wrote:
> Jignesh K. Shah wrote:
>
> > Top Wait time seems to come from the following code path for
> > ProcArrayLock:
> >
> > Lock Id Mode Count
> > ProcArrayLock Exclusive 21
> >
> > Lock Id Combined Time (ns)
> > ProcArrayLock 5255937500
> >
> > Lock Id Combined Time (ns)
> >
> >
> > postgres`LWLockAcquire+0x1f0
> > postgres`CommitTransaction+0x104
>
> Yeah, ProcArrayLock is pretty contended. I think it would be kinda neat
> if we could split it up in partitions. This lock is quite particular
> though.

Maybe, if we did we should set the partitions according to numbers of
users, so lower numbers of users are all in one partition.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 09:15:27
Message-ID:	1185527727.4191.22.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
> Lock Id Combined Time (ns)
> XidGenLock 194966200
> WALInsertLock 517955000
> CLogControlLock 679665100
> WALWriteLock 2838716200
> ProcArrayLock 44181002600

Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.

Can we have a table like this:
LockId,LockMode,SumTimeLockHeld,SumTimeLockWait

Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions?

Are you running with commit_delay > 0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
remove_commit_delay.v1.test.patch	text/x-patch	850 bytes

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 12:49:57
Message-ID:	46A9E9F5.3020901@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Yes I can try to breakup the Shared and exclusive time..

Also yes I use commit delays =10, it helps out a lot in reducing IO load..

I will try out the patch soon.

-Jignesh

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 19:11:35
Message-ID:	46AA4367.4020706@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I
havent tried 16 yet though.. ) I am going to try your second patch now..

Also here is the breakup by Mode. The combined time is the total time it
waits for all counts.

Lock Id Mode Count
ProcArrayLock Shared 1
CLogControlLock Exclusive 4
CLogControlLock Shared 4
XidGenLock Shared 4
XidGenLock Exclusive 7
WALInsertLock Exclusive 21
WALWriteLock Exclusive 62
ProcArrayLock Exclusive 79

Lock Id Mode Combined Time (ns)
CLogControlLock Exclusive 325200
CLogControlLock Shared 4509200
XidGenLock Exclusive 11839600
ProcArrayLock Shared 40506600
XidGenLock Shared 119013700
WALInsertLock Exclusive 148063100
WALWriteLock Exclusive 347052100
ProcArrayLock Exclusive 1054780600

Here is another one at higher user count 1600:

bash-3.00# ./4_lwlock_waits.d 9208

Lock Id Mode Count
CLogControlLock Exclusive 1
CLogControlLock Shared 2
XidGenLock Shared 7
WALInsertLock Exclusive 12
WALWriteLock Exclusive 50
ProcArrayLock Exclusive 82

Lock Id Mode Combined Time (ns)
CLogControlLock Exclusive 27300
XidGenLock Shared 14689300
CLogControlLock Shared 72664900
WALInsertLock Exclusive 101431300
WALWriteLock Exclusive 534357400
ProcArrayLock Exclusive 4110350300

Now I will try with your second patch.

Regards,
Jignesh

Simon Riggs wrote:
> On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>
>> Lock Id Combined Time (ns)
>> XidGenLock 194966200
>> WALInsertLock 517955000
>> CLogControlLock 679665100
>> WALWriteLock 2838716200
>> ProcArrayLock 44181002600
>>
>
> Is this the time the lock is held for or the time that we wait for that
> lock? It would be good to see the break down of time separately for
> shared and exclusive.
>
> Can we have a table like this:
> LockId,LockMode,SumTimeLockHeld,SumTimeLockWait
>
>
>> Top Wait time seems to come from the following code path for
>> ProcArrayLock:
>>
>> Lock Id Mode Count
>> ProcArrayLock Exclusive 21
>>
>> Lock Id Combined Time (ns)
>> ProcArrayLock 5255937500
>>
>> Lock Id Combined Time (ns)
>>
>>
>> postgres`LWLockAcquire+0x1f0
>> postgres`CommitTransaction+0x104
>> postgres`CommitTransactionCommand+0xbc
>> postgres`finish_xact_command+0x78
>>
>
> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?
>
> Are you running with commit_delay > 0? Its possible that the call to
> CountActiveBackends() is causing pinging of the procarray by other
> backends while we're trying to read it during CommitTransaction(). If
> so, try the attached patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/backend/access/transam/xact.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
> retrieving revision 1.245
> diff -c -r1.245 xact.c
> *** src/backend/access/transam/xact.c 7 Jun 2007 21:45:58 -0000 1.245
> --- src/backend/access/transam/xact.c 27 Jul 2007 09:09:08 -0000
> ***************
> *** 820,827 ****
> * are fewer than CommitSiblings other backends with active
> * transactions.
> */
> ! if (CommitDelay > 0 && enableFsync &&
> ! CountActiveBackends() >= CommitSiblings)
> pg_usleep(CommitDelay);
>
> XLogFlush(recptr);
> --- 820,826 ----
> * are fewer than CommitSiblings other backends with active
> * transactions.
> */
> ! if (CommitDelay > 0 && enableFsync)
> pg_usleep(CommitDelay);
>
> XLogFlush(recptr);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-27 20:04:57
Message-ID:	46AA4FE9.2030302@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Using CLOG Buffers 32 and the commit sibling check patch I still see a
drop at 1200-1300 users..

bash-3.00# ./4_lwlock_waits.d 18250

Lock Id Mode Count
XidGenLock Shared 1
CLogControlLock Shared 2
ProcArrayLock Shared 2
XidGenLock Exclusive 4
CLogControlLock Exclusive 15
WALInsertLock Exclusive 18
WALWriteLock Exclusive 38
ProcArrayLock Exclusive 77

Lock Id Mode Combined Time (ns)
XidGenLock Shared 88700
WALInsertLock Exclusive 69556000
ProcArrayLock Shared 95656800
XidGenLock Exclusive 139634100
CLogControlLock Exclusive 148822200
CLogControlLock Shared 161630000
WALWriteLock Exclusive 332781800
ProcArrayLock Exclusive 5688265500

bash-3.00# ./4_lwlock_waits.d 18599

Lock Id Mode Count
ProcArrayLock Shared 2
XidGenLock Exclusive 3
XidGenLock Shared 4
CLogControlLock Shared 5
WALInsertLock Exclusive 10
CLogControlLock Exclusive 21
WALWriteLock Exclusive 28
ProcArrayLock Exclusive 54

Lock Id Mode Combined Time (ns)
XidGenLock Exclusive 5688800
WALInsertLock Exclusive 11424700
CLogControlLock Shared 55589100
ProcArrayLock Shared 135220400
WALWriteLock Exclusive 177906900
XidGenLock Shared 524146500
CLogControlLock Exclusive 524563900
ProcArrayLock Exclusive 5828744500

bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599

Lock Id Mode Count
ProcArrayLock Shared 1
ProcArrayLock Exclusive 52

Lock Id Mode Combined Time (ns)
ProcArrayLock Shared 41428300
ProcArrayLock Exclusive 3858386500

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1f0
postgres`GetSnapshotData+0x120
postgres`GetTransactionSnapshot+0x80
postgres`PortalStart+0x198
postgres`exec_bind_message+0x84c
postgres`PostgresMain+0x17f8
postgres`BackendRun+0x2f8
postgres`ServerLoop+0x680
postgres`PostmasterMain+0xda8
postgres`main+0x3d0
postgres`_start+0x17c
Shared 41428300

-Jignesh

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-30 19:23:42
Message-ID:	46AE3ABE.1090105@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

With CLOG 16 the drp[s comes at about 1150 users with the following lock
stats
bash-3.00# ./4_lwlock_waits.d 16404

Lock Id Mode Count
ProcArrayLock Shared 2
XidGenLock Exclusive 2
XidGenLock Shared 4
WALInsertLock Exclusive 7
CLogControlLock Shared 8
WALWriteLock Exclusive 46
ProcArrayLock Exclusive 64
CLogControlLock Exclusive 263

Lock Id Mode Combined Time (ns)
XidGenLock Exclusive 528300
ProcArrayLock Shared 968800
WALInsertLock Exclusive 4090900
XidGenLock Shared 73987600
WALWriteLock Exclusive 86200700
ProcArrayLock Exclusive 130756000
CLogControlLock Shared 240471000
CLogControlLock Exclusive 4115158500

So I think 32 is a better option for CLogs before ProcArrayLock becomes
the bottleneck.

Though I havent seen what we can do with ProcArrayLock problem.

Regards,
Jignesh

Jignesh K. Shah wrote:
> Using CLOG Buffers 32 and the commit sibling check patch I still see a
> drop at 1200-1300 users..
>
>
>
> bash-3.00# ./4_lwlock_waits.d 18250
>
> Lock Id Mode Count
> XidGenLock Shared 1
> CLogControlLock Shared 2
> ProcArrayLock Shared 2
> XidGenLock Exclusive 4
> CLogControlLock Exclusive 15
> WALInsertLock Exclusive 18
> WALWriteLock Exclusive 38
> ProcArrayLock Exclusive 77
>
> Lock Id Mode Combined Time (ns)
> XidGenLock Shared 88700
> WALInsertLock Exclusive 69556000
> ProcArrayLock Shared 95656800
> XidGenLock Exclusive 139634100
> CLogControlLock Exclusive 148822200
> CLogControlLock Shared 161630000
> WALWriteLock Exclusive 332781800
> ProcArrayLock Exclusive 5688265500
>
> bash-3.00# ./4_lwlock_waits.d 18599
>
> Lock Id Mode Count
> ProcArrayLock Shared 2
> XidGenLock Exclusive 3
> XidGenLock Shared 4
> CLogControlLock Shared 5
> WALInsertLock Exclusive 10
> CLogControlLock Exclusive 21
> WALWriteLock Exclusive 28
> ProcArrayLock Exclusive 54
>
> Lock Id Mode Combined Time (ns)
> XidGenLock Exclusive 5688800
> WALInsertLock Exclusive 11424700
> CLogControlLock Shared 55589100
> ProcArrayLock Shared 135220400
> WALWriteLock Exclusive 177906900
> XidGenLock Shared 524146500
> CLogControlLock Exclusive 524563900
> ProcArrayLock Exclusive 5828744500
>
> bash-3.00#
> bash-3.00# ./6_lwlock_stack.d 4 18599
>
> Lock Id Mode Count
> ProcArrayLock Shared 1
> ProcArrayLock Exclusive 52
>
> Lock Id Mode Combined Time (ns)
> ProcArrayLock Shared 41428300
> ProcArrayLock Exclusive 3858386500
>
> Lock Id Combined Time (ns)
>
>
> postgres`LWLockAcquire+0x1f0
> postgres`GetSnapshotData+0x120
> postgres`GetTransactionSnapshot+0x80
> postgres`PortalStart+0x198
> postgres`exec_bind_message+0x84c
> postgres`PostgresMain+0x17f8
> postgres`BackendRun+0x2f8
> postgres`ServerLoop+0x680
> postgres`PostmasterMain+0xda8
> postgres`main+0x3d0
> postgres`_start+0x17c
> Shared 41428300
>
> postgres`LWLockAcquire+0x1f0
> postgres`CommitTransaction+0x104
> postgres`CommitTransactionCommand+0xbc
> postgres`finish_xact_command+0x78
> postgres`exec_execute_message+0x42c
> postgres`PostgresMain+0x1838
> postgres`BackendRun+0x2f8
> postgres`ServerLoop+0x680
> postgres`PostmasterMain+0xda8
> postgres`main+0x3d0
> postgres`_start+0x17c
> Exclusive 3858386500
>
>
> -Jignesh
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-31 21:33:20
Message-ID:	200707311433.20568.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Simon,

> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?

Given that this is TPCC-analog, I'd assume that we are.

Jignesh?

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	josh(at)agliodbs(dot)com
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: User concurrency thresholding: where do I look?
Date:	2007-07-31 21:46:26
Message-ID:	46AFADB2.5070409@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Yep quite a bit of transactions .. But the piece that's slow is where it
is clearing it up in CommitTransaction().
I am not sure of how ProcArrayLock is designed to work and hence not
clear what we are seeing is what we expect.

Do we have some design doc on ProcArrayLock to understand its purpose?

Thanks.
Regards,
Jignesh

Josh Berkus wrote:
> Simon,
>
>
>> Well thats pretty weird. That code path clearly only happens once per
>> transaction and ought to be fast. The other code paths that take
>> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
>> ought to spend more time holding the lock. Presumably you are running
>> with a fair number of SERIALIZABLE transactions?
>>
>
> Given that this is TPCC-analog, I'd assume that we are.
>
> Jignesh?
>
>

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	CLOG Patch
Date:	2007-08-03 20:09:39
Message-ID:	46B38B83.1000906@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hi Simon,

This patch seems to work well (both with 32 and 64 value but not with 16
and the default 8). Is there a way we can integrate this in 8.3?

This will improve out of box performance quite a bit for high number of
users (atleat 30% in my OLTP test)

Regards,
Jignesh

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: CLOG Patch
Date:	2007-08-03 20:29:47
Message-ID:	1186172987.4136.18.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote:

> This patch seems to work well (both with 32 and 64 value but not with 16
> and the default 8).

Could you test at 24 please also? Tom has pointed out the additional
cost of setting this higher, even in workloads that don't benefit from
the I/O-induced contention reduction.

> Is there a way we can integrate this in 8.3?

I just replied to Josh's thread on -hackers about this.

> This will improve out of box performance quite a bit for high number of
> users (atleat 30% in my OLTP test)

Yes, thats good. Will this have a dramatic effect on a particular
benchmark, or for what reason might we need this? Tom has questioned the
use case here, so I think it would be good to explain a little more for
everyone. Thanks.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: CLOG Patch
Date:	2007-08-10 17:54:41
Message-ID:	46BCA661.6000404@sun.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I tried with CLOG 24 also and I got linear performance upto 1250 users
after which it started to tank. 32 got us to 1350 users before some
other bottleneck overtook it.

Based on what Tom said earlier, it might then make sense to make it a
tunable with the default of 8 but something one can change for high
number of users.

Thanks.
Regards,
Jignesh

Simon Riggs wrote:
> On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote:
>
>
>> This patch seems to work well (both with 32 and 64 value but not with 16
>> and the default 8).
>>
>
> Could you test at 24 please also? Tom has pointed out the additional
> cost of setting this higher, even in workloads that don't benefit from
> the I/O-induced contention reduction.
>
>
>> Is there a way we can integrate this in 8.3?
>>
>
> I just replied to Josh's thread on -hackers about this.
>
>
>> This will improve out of box performance quite a bit for high number of
>> users (atleat 30% in my OLTP test)
>>
>
> Yes, thats good. Will this have a dramatic effect on a particular
> benchmark, or for what reason might we need this? Tom has questioned the
> use case here, so I think it would be good to explain a little more for
> everyone. Thanks.
>
>

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: CLOG Patch
Date:	2007-08-10 20:03:19
Message-ID:	1186776199.4505.39.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Fri, 2007-08-10 at 13:54 -0400, Jignesh K. Shah wrote:
> I tried with CLOG 24 also and I got linear performance upto 1250 users
> after which it started to tank. 32 got us to 1350 users before some
> other bottleneck overtook it.

Jignesh,

Thanks for testing that.

It's not very clear to everybody why an extra 100 users is useful and it
would certainly help your case if you can explain.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com