Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community

Lists: pgsql-hackerspgsql-performance
From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-16 15:18:58
Message-ID: 4492CBE2.3090509@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


I am thrill to inform you all that Sun has just donated a fully loaded
T2000 system to the PostgreSQL community, and it's being setup by Corey
Shields at OSL (osuosl.org) and should be online probably early next
week. The system has

* 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as
having 32 virtual CPUs, and each can be enabled or disabled individually
* 32 GB of DDR2 SDRAM memory
* 2 @ 73GB internal SAS drives (10000 RPM)
* 4 Gigabit ethernet ports

For complete spec, visit
http://www.sun.com/servers/coolthreads/t2000/specifications.jsp

I think this system is well suited for PG scalability testing, among
others. We did an informal test using an internal OLTP benchmark and
noticed that PG can scale to around 8 CPUs. Would be really cool if all
32 virtual CPUs can be utilized!!!

Anyways, if you need to access the system for testing purposes, please
contact Josh Berkus.

Regards,

Robert Lor
Sun Microsystems, Inc.
01-510-574-7189


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Lor <Robert(dot)Lor(at)sun(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-16 17:01:56
Message-ID: 200606161001.57142.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Folks,

> I am thrill to inform you all that Sun has just donated a fully loaded
> T2000 system to the PostgreSQL community, and it's being setup by Corey
> Shields at OSL (osuosl.org) and should be online probably early next
> week. The system has

So this system will be hosted by Open Source Lab in Oregon. It's going to
be "donated" to Software In the Public Interest, who will own for the
PostgreSQL fund.

We'll want to figure out a scheduling system to schedule performance and
compatibility testing on this machine; I'm not sure exactly how that will
work. Suggestions welcome. As a warning, Gavin Sherry and I have a bunch
of pending tests already to run.

First thing as soon as I have a login, of course, is to set up a Buildfarm
instance.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-16 22:34:20
Message-ID: 449331EC.10207@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 16-6-2006 17:18, Robert Lor wrote:
>
> I think this system is well suited for PG scalability testing, among
> others. We did an informal test using an internal OLTP benchmark and
> noticed that PG can scale to around 8 CPUs. Would be really cool if all
> 32 virtual CPUs can be utilized!!!

I can already confirm very good scalability (with our workload) on
postgresql on that machine. We've been testing a 32thread/16G-version
and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores
(with all four threads enabled).

The threads are a bit less scalable, but still pretty good. Enabling 1,
2 or 4 threads for each core yields resp 60 and 130% extra performance.

Best regards,

Arjen


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>, Robert Lor <Robert(dot)Lor(at)sun(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-16 23:24:16
Message-ID: 200606161624.17081.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Arjen,

> I can already confirm very good scalability (with our workload) on
> postgresql on that machine. We've been testing a 32thread/16G-version
> and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores
> (with all four threads enabled).

Keen. We're trying to keep the linear scaling going up to 32 cores of
course (which doesn't happen, presently). Would you be interested in
helping us troubleshoot some of the performance issues?

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: Robert(dot)Lor(at)Sun(dot)COM
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-17 01:15:21
Message-ID: 20060617.101521.95879771.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> I am thrill to inform you all that Sun has just donated a fully loaded
> T2000 system to the PostgreSQL community, and it's being setup by Corey
> Shields at OSL (osuosl.org) and should be online probably early next
> week. The system has
>
> * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as
> having 32 virtual CPUs, and each can be enabled or disabled individually
> * 32 GB of DDR2 SDRAM memory
> * 2 @ 73GB internal SAS drives (10000 RPM)
> * 4 Gigabit ethernet ports
>
> For complete spec, visit
> http://www.sun.com/servers/coolthreads/t2000/specifications.jsp
>
> I think this system is well suited for PG scalability testing, among
> others. We did an informal test using an internal OLTP benchmark and
> noticed that PG can scale to around 8 CPUs. Would be really cool if all
> 32 virtual CPUs can be utilized!!!

Interesting. We (some Japanese companies including SRA OSS,
Inc. Japan) did some PG scalability testing using a Unisys's big 16
(physical) CPU machine and found PG scales up to 8 CPUs. However
beyond 8 CPU PG does not scale anymore. The result can be viewed at
"OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
PG has a serious lock contention problem in the environment by
analyzing the oprofile result.

You can take a look at the detailed report at:
http://ossipedia.ipa.go.jp/capacity/EV0604210111/
(unfortunately only Japanese contents is available at the
moment. Please use some automatic translation services)

Evalution environment was:
PostgreSQL 8.1.2
OSDL DBT-1 2.1
Miracle Linux 4.0
Unisys ES700 Xeon 2.8GHz CPU x 16 Mem 16GB(HT off)
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: ishii(at)sraoss(dot)co(dot)jp, Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-17 02:18:38
Message-ID: 20060617.111838.39487910.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > Interesting. We (some Japanese companies including SRA OSS,
> > Inc. Japan) did some PG scalability testing using a Unisys's big 16
> > (physical) CPU machine and found PG scales up to 8 CPUs. However
> > beyond 8 CPU PG does not scale anymore. The result can be viewed at
> > "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
> > PG has a serious lock contention problem in the environment by
> > analyzing the oprofile result.
>
> 18% in s_lock is definitely bad :-(. Were you able to determine which
> LWLock(s) are accounting for the contention?

Yes. We were interested in that too. Some people did addtional tests
to determin that. I don't have the report handy now. I will report
back next week.

> The test case seems to be spending a remarkable amount of time in LIKE
> comparisons, too. That probably is not a representative condition.

I know. I think point is 18% in s_lock only appears with 12 CPUs or more.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-17 02:34:15
Message-ID: 10226.1150511655@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> Interesting. We (some Japanese companies including SRA OSS,
> Inc. Japan) did some PG scalability testing using a Unisys's big 16
> (physical) CPU machine and found PG scales up to 8 CPUs. However
> beyond 8 CPU PG does not scale anymore. The result can be viewed at
> "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
> PG has a serious lock contention problem in the environment by
> analyzing the oprofile result.

18% in s_lock is definitely bad :-(. Were you able to determine which
LWLock(s) are accounting for the contention?

The test case seems to be spending a remarkable amount of time in LIKE
comparisons, too. That probably is not a representative condition.

regards, tom lane


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-17 04:17:04
Message-ID: 44938240.6040804@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Arjen van der Meijden wrote:

>
> I can already confirm very good scalability (with our workload) on
> postgresql on that machine. We've been testing a 32thread/16G-version
> and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores
> (with all four threads enabled).
>
> The threads are a bit less scalable, but still pretty good. Enabling
> 1, 2 or 4 threads for each core yields resp 60 and 130% extra
> performance.

Wow, what type of workload is it? And did you do much tuning to get
near-linear scalability to 32 threads?

Regards,
-Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Robert(dot)Lor(at)sun(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-17 19:19:40
Message-ID: 200606171219.40374.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom,

> 18% in s_lock is definitely bad :-(.  Were you able to determine which
> LWLock(s) are accounting for the contention?

Gavin Sherry and Tom Daly (Sun) are currently working on identifying the
problem lock using DLWLOCK_STATS. Any luck, Gavin?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: josh(at)agliodbs(dot)com
Cc: Robert Lor <Robert(dot)Lor(at)sun(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-17 19:53:04
Message-ID: 0590A28B-A34C-43C0-BBF7-2DDF28BABAF6@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:

> Folks,
>
>> I am thrill to inform you all that Sun has just donated a fully
>> loaded
>> T2000 system to the PostgreSQL community, and it's being setup by
>> Corey
>> Shields at OSL (osuosl.org) and should be online probably early next
>> week. The system has
>
> So this system will be hosted by Open Source Lab in Oregon. It's
> going to
> be "donated" to Software In the Public Interest, who will own for the
> PostgreSQL fund.
>
> We'll want to figure out a scheduling system to schedule
> performance and
> compatibility testing on this machine; I'm not sure exactly how
> that will
> work. Suggestions welcome. As a warning, Gavin Sherry and I have
> a bunch
> of pending tests already to run.
>
> First thing as soon as I have a login, of course, is to set up a
> Buildfarm
> instance.
>
> --
> --Josh
>
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> your
> message can get through to the mailing list cleanly
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Jim Nasby <decibel(at)decibel(dot)org>
To: josh(at)agliodbs(dot)com
Cc: Robert Lor <Robert(dot)Lor(at)sun(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL community
Date: 2006-06-17 19:54:50
Message-ID: C241B6C6-21B3-4C75-81A7-D543DFC3FC4A@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:
> First thing as soon as I have a login, of course, is to set up a
> Buildfarm
> instance.

Keep in mind that buildfarm clients and benchmarking stuff don't
usually mix well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jim Nasby <decibel(at)decibel(dot)org>
Cc: josh(at)agliodbs(dot)com, Robert Lor <Robert(dot)Lor(at)sun(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-17 21:46:40
Message-ID: 44947840.8040806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim Nasby wrote:

> On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:
>
>> First thing as soon as I have a login, of course, is to set up a
>> Buildfarm
>> instance.
>
>
> Keep in mind that buildfarm clients and benchmarking stuff don't
> usually mix well.
>

On a fast machine like this a buildfarm run is not going to take very
long. You could run those once a day at times of low demand. Or even
once or twice a week.

cheers

andrew


From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Lor <Robert(dot)Lor(at)sun(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-18 09:17:54
Message-ID: 44951A42.6060007@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 17-6-2006 1:24, Josh Berkus wrote:
> Arjen,
>
>> I can already confirm very good scalability (with our workload) on
>> postgresql on that machine. We've been testing a 32thread/16G-version
>> and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores
>> (with all four threads enabled).
>
> Keen. We're trying to keep the linear scaling going up to 32 cores of
> course (which doesn't happen, presently). Would you be interested in
> helping us troubleshoot some of the performance issues?

You can ask your questions, if I happen to do know the answer, you're a
step further in the right direction.

But actually, I didn't do much to get this scalability... So I won't be
of much help to you, its not that I spent hours on getting this performance.
I just started out with the "normal" attempts to get a good config.
Currently the shared buffers is set to 30k. Larger settings didn't seem
to differ much on our previous 4-core version, so I didn't even check it
out on this one. I noticed I forgot to set the effective cache size to
more than 6G for this one too, but since our database is smaller than
that, that shouldn't make any difference. The work memory was increased
a bit to 2K. So there are no magic tricks here.

I do have to add its a recent checkout of 8.2devel compiled using Sun
Studio 11. It was compiled using this as CPPFLAGS: -xtarget=ultraT1
-fast -xnolibmopt

The -xnolibmopt was added because we couldn't figure out why it yielded
several linking errors at the end of the compilation when the -xlibmopt
from -fast was enabled, so we disabled that particular setting from the
-fast macro.

The workload generated is an abstraction and simplification of our
website's workload, used for benchmarking. Its basically a news and
price comparision site and it runs on LAMP (with the M of MySQL), i.e. a
lot of light queries, many primary-key or indexed "foreign-key" lookups
for little amounts of records. Some aggregations for summaries, etc.
There are little writes and hardly any on the most read tables.
The database easily fits in memory, the total size of the actively read
tables is about 3G.
This PostgreSQL-version is not a direct copy of the queries and tables,
but I made an effort of getting it more PostgreSQL-minded as much as
possible. I.e. I combined a few queries, I changed "boolean"-enum's in
MySQL to real booleans in Postgres, I added specific indexes (including
partials) etc.

We use apache+php as clients and just open X apache processes using 'ab'
at the same time to generate various amounts of concurrent workloads.
Solaris scales really well to higher concurrencies and PostgreSQL
doesn't seem to have problems with it either in our workload.

So its not really a real-life scenario, but its not a synthetic
benchmark either.

Here is a graph of our performance measured on PostgreSQL:
http://achelois.tweakers.net/~acm/pgsql-t2000/T2000-schaling-postgresql.png

What you see are three lines. Each represents the amount of total "page
views" processed in 600 seconds for a specific amount of Niagara-cores
(i.e. 1, 2, 4, 6 and 8). Each core had all its threads enabled, so its
actually 4, 8, 16, 24 and 32 virtual cpu's you're looking at.
The "Max"-line displays the maximum generated "page views" on a specific
core-amount for any concurrency, respectively: 5, 13, 35, 45 and 60.
The "Bij 50" is the amount of "page views" it generated with 50
apache-processes working at the same time (on two dual xeon machines, so
25 each). I took 50 a bit arbitrary but all core-configs seemed to do
pretty well under that workload.

The "perfect" line is based on the "Max" value for 1 core and then just
multiplied by the amount of cores to have a linear reference. The "Bij
50" and the "perfect" line don't differ too much in color, but the
top-one is the "perfect" line.

In the near future we'll be presenting an article on this on our
website, although that will be in dutch the graphs should still be easy
to read for you guys.
And because of that I can't promise too much detailed information until
then.

I hope I clarified things a bit now, if not ask me about it,
Best regards,

Arjen


From: David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-22 13:03:47
Message-ID: 449A9533.1030103@diroussel.xsmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Arjen van der Meijden wrote:
>
> Here is a graph of our performance measured on PostgreSQL:
> http://achelois.tweakers.net/~acm/pgsql-t2000/T2000-schaling-postgresql.png
>
>
...
>
> The "perfect" line is based on the "Max" value for 1 core and then
> just multiplied by the amount of cores to have a linear reference. The
> "Bij 50" and the "perfect" line don't differ too much in color, but
> the top-one is the "perfect" line.

Sureky the 'perfect' line ought to be linear? If the performance was
perfectly linear, then the 'pages generated' ought to be G times the
number (virtual) processors, where G is the gradient of the graph. In
such a case the graph will go through the origin (o,o), but you graph
does not show this.

I'm a bit confused, what is the 'perfect' supposed to be?

Thanks

David


From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-22 14:03:25
Message-ID: 449AA32D.1010502@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Arjen van der Meijden wrote:
> First of all, this graph has no origin. Its a bit difficult to test with
> less than one cpu.

Sure it does. I ran all the tests. They all took infinite time, and I got zero results. And my results are 100% accurate and reliable. It's perfectly valid data. :-)

Craig


From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-06-22 14:19:21
Message-ID: 449AA6E9.9050804@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 22-6-2006 15:03, David Roussel wrote:
> Sureky the 'perfect' line ought to be linear? If the performance was
> perfectly linear, then the 'pages generated' ought to be G times the
> number (virtual) processors, where G is the gradient of the graph. In
> such a case the graph will go through the origin (o,o), but you graph
> does not show this.
>
> I'm a bit confused, what is the 'perfect' supposed to be?

First of all, this graph has no origin. Its a bit difficult to test with
less than one cpu.

Anyway, the line actually is linear and would've gone through the
origin, if there was one. What I did was take the level of the
'max'-line at 1 and then multiply it by 2, 4, 6 and 8. So if at 1 the
level would've been 22000, the 2 would be 44000 and the 8 176000.

Please do notice the distance between 1 and 2 on the x-axis is the same
as between 2 and 4, which makes the graph a bit harder to read.

Best regards,

Arjen


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-13 09:02:34
Message-ID: 20060713.180234.45872055.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > > Interesting. We (some Japanese companies including SRA OSS,
> > > Inc. Japan) did some PG scalability testing using a Unisys's big 16
> > > (physical) CPU machine and found PG scales up to 8 CPUs. However
> > > beyond 8 CPU PG does not scale anymore. The result can be viewed at
> > > "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
> > > PG has a serious lock contention problem in the environment by
> > > analyzing the oprofile result.
> >
> > 18% in s_lock is definitely bad :-(. Were you able to determine which
> > LWLock(s) are accounting for the contention?
>
> Yes. We were interested in that too. Some people did addtional tests
> to determin that. I don't have the report handy now. I will report
> back next week.

Sorry for the delay. Finally I got the oprofile data. It's
huge(34MB). If you are interested, I can put somewhere. Please let me
know.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-13 14:44:19
Message-ID: 13404.1152801859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
>>> 18% in s_lock is definitely bad :-(. Were you able to determine which
>>> LWLock(s) are accounting for the contention?
>>
>> Yes. We were interested in that too. Some people did addtional tests
>> to determin that. I don't have the report handy now. I will report
>> back next week.

> Sorry for the delay. Finally I got the oprofile data. It's
> huge(34MB). If you are interested, I can put somewhere. Please let me
> know.

Yes, please.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-20 19:01:58
Message-ID: 8669.1153422118@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
>>> 18% in s_lock is definitely bad :-(. Were you able to determine which
>>> LWLock(s) are accounting for the contention?

> Sorry for the delay. Finally I got the oprofile data. It's
> huge(34MB). If you are interested, I can put somewhere. Please let me
> know.

I finally got a chance to look at this, and it seems clear that all the
traffic is on the BufMappingLock. This is essentially the same problem
we were discussing with respect to Gavin Hamill's report of poor
performance on an 8-way IBM PPC64 box (see hackers archives around
2006-04-21). If your database is fully cached in shared buffers, then
you can do a whole lot of buffer accesses per unit time, and even though
all the BufMappingLock acquisitions are in shared-LWLock mode, the
LWLock's spinlock ends up being heavily contended on an SMP box.

It's likely that CVS HEAD would show somewhat better performance because
of the btree change to cache local copies of index metapages (which
eliminates a fair fraction of buffer accesses, at least in Gavin's test
case). Getting much further than that seems to require partitioning
the buffer mapping table. The last discussion stalled on my concerns
about unpredictable shared memory usage, but I have some ideas on that
which I'll post separately. In the meantime, thanks for sending along
the oprofile data!

regards, tom lane


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-21 07:56:56
Message-ID: 44C088C8.9050303@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
>
>
>>>>18% in s_lock is definitely bad :-(. Were you able to determine which
>>>>LWLock(s) are accounting for the contention?
>>>>
>>>>
>
>
>
>>Sorry for the delay. Finally I got the oprofile data. It's
>>huge(34MB). If you are interested, I can put somewhere. Please let me
>>know.
>>
>>
>
>I finally got a chance to look at this, and it seems clear that all the
>traffic is on the BufMappingLock. This is essentially the same problem
>we were discussing with respect to Gavin Hamill's report of poor
>performance on an 8-way IBM PPC64 box (see hackers archives around
>2006-04-21). If your database is fully cached in shared buffers, then
>you can do a whole lot of buffer accesses per unit time, and even though
>all the BufMappingLock acquisitions are in shared-LWLock mode, the
>LWLock's spinlock ends up being heavily contended on an SMP box.
>
>It's likely that CVS HEAD would show somewhat better performance because
>of the btree change to cache local copies of index metapages (which
>eliminates a fair fraction of buffer accesses, at least in Gavin's test
>case). Getting much further than that seems to require partitioning
>the buffer mapping table. The last discussion stalled on my concerns
>about unpredictable shared memory usage, but I have some ideas on that
>which I'll post separately. In the meantime, thanks for sending along
>the oprofile data!
>
> regards, tom lane
>
>
I ran pgbench and fired up a DTrace script using the lwlock probes we've
added, and it looks like BufMappingLock is the most contended lock, but
CheckpointStartLocks are held for longer duration!

Lock Id Mode Count
ControlFileLock Exclusive 1
SubtransControlLock Exclusive 1
BgWriterCommLock Exclusive 6
FreeSpaceLock Exclusive 6
FirstLockMgrLock Exclusive 48
BufFreelistLock Exclusive 74
BufMappingLock Exclusive 74
CLogControlLock Exclusive 184
XidGenLock Exclusive 184
CheckpointStartLock Shared 185
WALWriteLock Exclusive 185
ProcArrayLock Exclusive 368
CLogControlLock Shared 552
SubtransControlLock Shared 1273
WALInsertLock Exclusive 1476
XidGenLock Shared 1842
ProcArrayLock Shared 3160
SInvalLock Shared 3684
BufMappingLock Shared 14578

Lock Id Combined Time (ns)
ControlFileLock 7915
BgWriterCommLock 43438
FreeSpaceLock 111139
BufFreelistLock 448530
FirstLockMgrLock 2879957
CLogControlLock 4237750
SubtransControlLock 6378042
XidGenLock 9500422
WALInsertLock 16372040
SInvalLock 23284554
ProcArrayLock 32188638
BufMappingLock 113128512
WALWriteLock 142391501
CheckpointStartLock 4171106665

Regards,
-Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-21 13:42:45
Message-ID: 21889.1153489365@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
> I ran pgbench and fired up a DTrace script using the lwlock probes we've
> added, and it looks like BufMappingLock is the most contended lock, but
> CheckpointStartLocks are held for longer duration!

Those numbers look a bit suspicious --- I'd expect to see some of the
LWLocks being taken in both shared and exclusive modes, but you don't
show any such cases. You sure your script is counting correctly?
Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.

regards, tom lane


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-21 13:56:53
Message-ID: 20060721135653.GC83250@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Jul 21, 2006 at 12:56:56AM -0700, Robert Lor wrote:
> I ran pgbench and fired up a DTrace script using the lwlock probes we've
> added, and it looks like BufMappingLock is the most contended lock, but
> CheckpointStartLocks are held for longer duration!

Not terribly surprising given that that lock can generate a substantial
amount of IO (though looking at the numbers, you might want to make
bgwriter more aggressive). Also, that's a shared lock, so it won't have
nearly the impact that BufMappingLock does.

> Lock Id Mode Count
> ControlFileLock Exclusive 1
> SubtransControlLock Exclusive 1
> BgWriterCommLock Exclusive 6
> FreeSpaceLock Exclusive 6
> FirstLockMgrLock Exclusive 48
> BufFreelistLock Exclusive 74
> BufMappingLock Exclusive 74
> CLogControlLock Exclusive 184
> XidGenLock Exclusive 184
> CheckpointStartLock Shared 185
> WALWriteLock Exclusive 185
> ProcArrayLock Exclusive 368
> CLogControlLock Shared 552
> SubtransControlLock Shared 1273
> WALInsertLock Exclusive 1476
> XidGenLock Shared 1842
> ProcArrayLock Shared 3160
> SInvalLock Shared 3684
> BufMappingLock Shared 14578
>
> Lock Id Combined Time (ns)
> ControlFileLock 7915
> BgWriterCommLock 43438
> FreeSpaceLock 111139
> BufFreelistLock 448530
> FirstLockMgrLock 2879957
> CLogControlLock 4237750
> SubtransControlLock 6378042
> XidGenLock 9500422
> WALInsertLock 16372040
> SInvalLock 23284554
> ProcArrayLock 32188638
> BufMappingLock 113128512
> WALWriteLock 142391501
> CheckpointStartLock 4171106665
>
>
> Regards,
> -Robert
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Sven Geisler <sgeisler(at)aeccom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-21 14:59:49
Message-ID: 44C0EBE5.8010508@aeccom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Hi,

Tom Lane schrieb:
> Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
>> I ran pgbench and fired up a DTrace script using the lwlock probes we've
>> added, and it looks like BufMappingLock is the most contended lock, but
>> CheckpointStartLocks are held for longer duration!
>
> Those numbers look a bit suspicious --- I'd expect to see some of the
> LWLocks being taken in both shared and exclusive modes, but you don't
> show any such cases. You sure your script is counting correctly?
> Also, it'd be interesting to count time spent holding shared lock
> separately from time spent holding exclusive.

Is there a test case which shows the contention for this full cached
tables? It would be nice to have measurable numbers like context
switches and queries per second.

Sven.


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-21 15:11:58
Message-ID: 44C0EEBE.3030707@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>Those numbers look a bit suspicious --- I'd expect to see some of the
>LWLocks being taken in both shared and exclusive modes, but you don't
>show any such cases. You sure your script is counting correctly?
>
>
I'll double check to make sure no stupid mistakes were made!

>Also, it'd be interesting to count time spent holding shared lock
>separately from time spent holding exclusive.
>
>
Will provide that data later today.

Regards,
-Robert


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-22 03:05:28
Message-ID: 44C195F8.9010409@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>Also, it'd be interesting to count time spent holding shared lock
>separately from time spent holding exclusive.
>
>
>
Tom,

Here is the break down between exclusive & shared LWLocks. Do the
numbers look reasonable to you?

Regards,
-Robert

bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres`
********** LWLock Count: Exclusive **********
Lock Id Mode Count
ControlFileLock Exclusive 1
FreeSpaceLock Exclusive 9
XidGenLock Exclusive 202
CLogControlLock Exclusive 203
WALWriteLock Exclusive 203
BgWriterCommLock Exclusive 222
BufFreelistLock Exclusive 305
BufMappingLock Exclusive 305
ProcArrayLock Exclusive 405
FirstLockMgrLock Exclusive 670
WALInsertLock Exclusive 1616

********** LWLock Count: Shared **********
Lock Id Mode Count
CheckpointStartLock Shared 202
CLogControlLock Shared 450
SubtransControlLock Shared 776
XidGenLock Shared 2020
ProcArrayLock Shared 3778
SInvalLock Shared 4040
BufMappingLock Shared 40838

********** LWLock Time: Exclusive **********
Lock Id Combined Time (ns)
ControlFileLock 8301
FreeSpaceLock 80590
CLogControlLock 1603557
BgWriterCommLock 1607122
BufFreelistLock 1997406
XidGenLock 2312442
BufMappingLock 3161683
FirstLockMgrLock 5392575
ProcArrayLock 6034396
WALInsertLock 12277693
WALWriteLock 324869744

********** LWLock Time: Shared **********
Lock Id Combined Time (ns)
CLogControlLock 3183788
SubtransControlLock 6956229
XidGenLock 12012576
SInvalLock 35567976
ProcArrayLock 45400779
BufMappingLock 300669441
CheckpointStartLock 4056134243

real 0m24.718s
user 0m0.382s
sys 0m0.181s


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-23 18:10:26
Message-ID: 3706.1153678226@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
> Here is the break down between exclusive & shared LWLocks. Do the
> numbers look reasonable to you?

Yeah, those seem plausible, although the hold time for
CheckpointStartLock seems awfully high --- about 20 msec
per transaction. Are you using a nonzero commit_delay?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: Robert(dot)Lor(at)Sun(dot)COM, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-23 23:52:16
Message-ID: 28596.1153698736@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
>>> Interesting. We (some Japanese companies including SRA OSS,
>>> Inc. Japan) did some PG scalability testing using a Unisys's big 16
>>> (physical) CPU machine and found PG scales up to 8 CPUs. However
>>> beyond 8 CPU PG does not scale anymore. The result can be viewed at
>>> "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
>>> PG has a serious lock contention problem in the environment by
>>> analyzing the oprofile result.

Can you retry this test case using CVS tip? I'm curious to see if
having partitioned the BufMappingLock helps ...

regards, tom lane


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-24 00:52:12
Message-ID: 44C419BC.1050503@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>Yeah, those seem plausible, although the hold time for
>CheckpointStartLock seems awfully high --- about 20 msec
>per transaction. Are you using a nonzero commit_delay?
>
>
>
>
I didn't change commit_delay which defaults to zero.

Regards,
-Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-24 01:29:39
Message-ID: 29920.1153704579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
> Tom Lane wrote:
>> Yeah, those seem plausible, although the hold time for
>> CheckpointStartLock seems awfully high --- about 20 msec
>> per transaction. Are you using a nonzero commit_delay?
>>
> I didn't change commit_delay which defaults to zero.

Hmmm ... AFAICS this must mean that flushing the WAL data to disk
at transaction commit time takes (most of) 20 msec on your hardware.
Which still seems high --- on most modern disks that'd be at least two
disk revolutions, maybe more. What's the disk hardware you're testing
on, particularly its RPM spec?

regards, tom lane


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sun Donated a Sun Fire T2000 to the PostgreSQL
Date: 2006-07-24 03:34:25
Message-ID: 44C43FC1.4050002@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>Hmmm ... AFAICS this must mean that flushing the WAL data to disk
>at transaction commit time takes (most of) 20 msec on your hardware.
>Which still seems high --- on most modern disks that'd be at least two
>disk revolutions, maybe more. What's the disk hardware you're testing
>on, particularly its RPM spec?
>
>
I actually ran the test on my laptop. It has an Ultra ATA/100 drive
(5400 rpm). The test was just a quickie to show some data from the
probes. I'll collect and share data from the T2000 server later.

Regards,
-Robert