Quick Links

Re: Wierd context-switching issue on Xeon

Lists:	pgsql-performance

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Wierd context-switching issue on Xeon
Date:	2003-11-25 22:19:36
Message-ID:	200311251419.36771.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Folks,

We're seeing some odd issues with hyperthreading-capable Xeons, whether or not
hyperthreading is enabled. Basically, when a small number of really-heavy
duty queries hit the system and push all of the CPUs to more than 70% used
(about 1/2 user & 1/2 kernel), the system goes to 100,000+ context switcthes
per second and performance degrades.

I know that there's other Xeon users on this list ... has anyone else seen
anything like that? The machines are Dells running Red Hat 7.3.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2003-11-25 23:37:42
Message-ID:	200311251537.42527.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom,

> Strictly a WAG ... but what this sounds like to me is disastrously bad
> behavior of the spinlock code under heavy contention. We thought we'd
> fixed the spinlock code for SMP machines awhile ago, but maybe
> hyperthreading opens some new vistas for misbehavior ...

Yeah, I thought of that based on the discussion on -Hackers. But we tried
turning off hyperthreading, with no change in behavior.

> If you can't try 7.4, or want to gather more data first, it would be
> good to try to confirm or disprove the theory that the context switches
> are coming from spinlock delays. If they are, they'd be coming from the
> select() calls in s_lock() in s_lock.c. Can you strace or something to
> see what kernel calls the context switches occur on?

Might be worth it ... will suggest that. Will also try 7.4.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2003-11-25 23:40:47
Message-ID:	9470.1069803647@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> We're seeing some odd issues with hyperthreading-capable Xeons, whether or not
> hyperthreading is enabled. Basically, when a small number of really-heavy
> duty queries hit the system and push all of the CPUs to more than 70% used
> (about 1/2 user & 1/2 kernel), the system goes to 100,000+ context switcthes
> per second and performance degrades.

Strictly a WAG ... but what this sounds like to me is disastrously bad
behavior of the spinlock code under heavy contention. We thought we'd
fixed the spinlock code for SMP machines awhile ago, but maybe
hyperthreading opens some new vistas for misbehavior ...

> I know that there's other Xeon users on this list ... has anyone else seen
> anything like that? The machines are Dells running Red Hat 7.3.

What Postgres version? Is it easy for you to try 7.4? If we were
really lucky, the random-backoff algorithm added late in 7.4 development
would cure this.

If you can't try 7.4, or want to gather more data first, it would be
good to try to confirm or disprove the theory that the context switches
are coming from spinlock delays. If they are, they'd be coming from the
select() calls in s_lock() in s_lock.c. Can you strace or something to
see what kernel calls the context switches occur on?

Another line of thought is that RH 7.3 is a long ways back, and it
wasn't so very long ago that Linux still had lots of SMP bugs. Maybe
what you really need is a kernel update?

regards, tom lane

From:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Cc:	Sven Geisler <sgeisler(at)aeccom(dot)com>
Subject:	RESOLVED: Re: Wierd context-switching issue on Xeon
Date:	2004-04-16 13:03:28
Message-ID:	407FD9A0.6040608@aeccom.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom, Josh,

I think we have the problem resolved after I found the following note
from Tom:

> A large number of semops may mean that you have excessive contention
on some lockable
> resource, but I don't have enough info to guess what resource.

This was the key to look at: we were missing all indices on table which
is used heavily and does lots of locking. After recreating the missing
indices the production system performed normal. No, more excessive
semop() calls, load way below 1.0, CS over 20.000 very rare, more in
thousands realm and less.

This is quite a relief but I am sorry that the problem was so stupid and
you wasted some time although Tom said he had also seem excessive
semop() calls on another Dual XEON system.

Hyperthreading was turned off so far but will be turned on again the
next days. I don't expect any problems then.

I'm not sure if this semop() problem is still an issue but the database
behaves a bit out of bounds in this situation, i.e. consuming system
resources with semop() calls 95% while tables are locked very often and
longer.

Thanks for your help,

Dirk

At last here is the current vmstat 1 excerpt where the problem has been
resolved:

procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 2308 232508 201924 6976532 0 0 136 464 628 812 5
1 94 0
0 0 2308 232500 201928 6976628 0 0 96 296 495 484 4
0 95 0
0 1 2308 232492 201928 6976628 0 0 0 176 347 278 1
0 99 0
0 0 2308 233484 201928 6976596 0 0 40 580 443 351 8
2 90 0
1 0 2308 233484 201928 6976696 0 0 76 692 792 651 9
2 88 0
0 0 2308 233484 201928 6976696 0 0 0 20 132 34 0
0 100 0
0 0 2308 233484 201928 6976696 0 0 0 76 177 90 0
0 100 0
0 1 2308 233484 201928 6976696 0 0 0 216 321 250 4
0 96 0
0 0 2308 233484 201928 6976696 0 0 0 116 417 240 8
0 92 0
0 0 2308 233484 201928 6976784 0 0 48 600 403 270 8
0 92 0
0 0 2308 233464 201928 6976860 0 0 76 452 1064 2611 14
1 84 0
0 0 2308 233460 201932 6976900 0 0 32 256 587 587 12
1 87 0
0 0 2308 233460 201932 6976932 0 0 32 188 379 287 5
0 94 0
0 0 2308 233460 201932 6976932 0 0 0 0 103 8 0
0 100 0
0 0 2308 233460 201932 6976932 0 0 0 0 102 14 0
0 100 0
0 1 2308 233444 201948 6976932 0 0 0 348 300 180 1
0 99 0
1 0 2308 233424 201948 6976948 0 0 16 380 739 906 4
2 93 0
0 0 2308 233424 201948 6977032 0 0 68 260 724 987 7
0 92 0
0 0 2308 231924 201948 6977128 0 0 96 344 1130 753 11
1 88 0
1 0 2308 231924 201948 6977248 0 0 112 324 687 628 3
0 97 0
0 0 2308 231924 201948 6977248 0 0 0 192 575 430 5
0 95 0
1 0 2308 231924 201948 6977248 0 0 0 264 208 124 0
0 100 0
0 0 2308 231924 201948 6977264 0 0 16 272 380 230 3
2 95 0
0 0 2308 231924 201948 6977264 0 0 0 0 104 8 0
0 100 0
0 0 2308 231924 201948 6977264 0 0 0 48 258 92 1
0 99 0
0 0 2308 231816 201948 6977484 0 0 212 268 456 384 2
0 98 0
0 0 2308 231816 201948 6977484 0 0 0 88 453 770 0
0 99 0
0 0 2308 231452 201948 6977680 0 0 196 476 615 676 5
0 94 0
0 0 2308 231452 201948 6977680 0 0 0 228 431 400 2
0 98 0
0 0 2308 231452 201948 6977680 0 0 0 0 237 58 3
0 97 0
0 0 2308 231448 201952 6977680 0 0 0 0 365 84 2
0 97 0
0 0 2308 231448 201952 6977680 0 0 0 40 246 108 1
0 99 0
0 0 2308 231448 201952 6977776 0 0 96 352 606 1026 4
2 94 0
0 0 2308 231448 201952 6977776 0 0 0 240 295 266 5
0 95 0

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
Cc:	josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org, Sven Geisler <sgeisler(at)aeccom(dot)com>
Subject:	Re: RESOLVED: Re: Wierd context-switching issue on Xeon
Date:	2004-04-16 13:49:38
Message-ID:	28603.1082123378@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?= <lutzeb(at)aeccom(dot)com> writes:
> This was the key to look at: we were missing all indices on table which
> is used heavily and does lots of locking. After recreating the missing
> indices the production system performed normal. No, more excessive
> semop() calls, load way below 1.0, CS over 20.000 very rare, more in
> thousands realm and less.

Hmm ... that's darn interesting. AFAICT the test case I am looking at
for Josh's client has no such SQL-level problem ... but I will go back
and double check ...

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Cc:	Sven Geisler <sgeisler(at)aeccom(dot)com>
Subject:	Re: RESOLVED: Re: Wierd context-switching issue on Xeon
Date:	2004-04-16 16:58:14
Message-ID:	200404160958.14902.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dirk,

> I'm not sure if this semop() problem is still an issue but the database
> behaves a bit out of bounds in this situation, i.e. consuming system
> resources with semop() calls 95% while tables are locked very often and
> longer.

It would be helpful to us if you could test this with the indexes disabled on
the non-Bigmem system. I'd like to eliminate Bigmem as a factor, if
possible.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Enterprise vertical business josh(at)agliodbs(dot)com
and data analysis solutions (415) 752-2387
and database optimization fax 651-9224
utilizing Open Source technology San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	lutzeb(at)aeccom(dot)com
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-18 21:47:41
Message-ID:	11437.1082324861@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

After some further digging I think I'm starting to understand what's up
here, and the really fundamental answer is that a multi-CPU Xeon MP box
sucks for running Postgres.

I did a bunch of oprofile measurements on a machine belonging to one of
Josh's clients, using a test case that involved heavy concurrent access
to a relatively small amount of data (little enough to fit into Postgres
shared buffers, so that no I/O or kernel calls were really needed once
the test got going). I found that by nearly any measure --- elapsed
time, bus transactions, or machine-clear events --- the spinlock
acquisitions associated with grabbing and releasing the BufMgrLock took
an unreasonable fraction of the time. I saw about 15% of elapsed time,
40% of bus transactions, and nearly 100% of pipeline-clear cycles going
into what is essentially two instructions out of the entire backend.
(Pipeline clears occur when the cache coherency logic detects a memory
write ordering problem.)

I am not completely clear on why this machine-level bottleneck manifests
as a lot of context swaps at the OS level. I think what is happening is
that because SpinLockAcquire is so slow, a process is much more likely
than you'd normally expect to arrive at SpinLockAcquire while another
process is also acquiring the spinlock. This puts the two processes
into a "lockstep" condition where the second process is nearly certain
to observe the BufMgrLock as locked, and be forced to suspend itself,
even though the time the first process holds the BufMgrLock is not
really very long at all.

If you google for Xeon and "cache coherency" you'll find quite a bit of
suggestive information about why this might be more true on the Xeon
setup than others. A couple of interesting hits:

http://www.theinquirer.net/?article=10797
says that Xeon MP uses a *slower* FSB than Xeon DP. This would
translate directly to more time needed to transfer a dirty cache line
from one processor to the other, which is the basic operation that we're
talking about here.

http://www.aceshardware.com/Spades/read.php?article_id=30000187
says that Opterons use a different cache coherency protocol that is
fundamentally superior to the Xeon's, because dirty cache data can be
transferred directly between two processor caches without waiting for
main memory.

So in the short term I think we have to tell people that Xeon MP is not
the most desirable SMP platform to run Postgres on. (Josh thinks that
the specific motherboard chipset being used in these machines might
share some of the blame too. I don't have any evidence for or against
that idea, but it's certainly possible.)

In the long run, however, CPUs continue to get faster than main memory
and the price of cache contention will continue to rise. So it seems
that we need to give up the assumption that SpinLockAcquire is a cheap
operation. In the presence of heavy contention it won't be.

One thing we probably have got to do soon is break up the BufMgrLock
into multiple finer-grain locks so that there will be less contention.
However I am wary of doing this incautiously, because if we do it in a
way that makes for a significant rise in the number of locks that have
to be acquired to access a buffer, we might end up with a net loss.

I think Neil Conway was looking into how the bufmgr might be
restructured to reduce lock contention, but if he had come up with
anything he didn't mention exactly what. Neil?

regards, tom lane

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-18 23:34:41
Message-ID:	1082331281.1557.47.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

So the the kernel/OS is irrelevant here ? this happens on any dual xeon?

What about hypterthreading does it still happen if HTT is turned off ?

Dave
On Sun, 2004-04-18 at 17:47, Tom Lane wrote:
> After some further digging I think I'm starting to understand what's up
> here, and the really fundamental answer is that a multi-CPU Xeon MP box
> sucks for running Postgres.
>
> I did a bunch of oprofile measurements on a machine belonging to one of
> Josh's clients, using a test case that involved heavy concurrent access
> to a relatively small amount of data (little enough to fit into Postgres
> shared buffers, so that no I/O or kernel calls were really needed once
> the test got going). I found that by nearly any measure --- elapsed
> time, bus transactions, or machine-clear events --- the spinlock
> acquisitions associated with grabbing and releasing the BufMgrLock took
> an unreasonable fraction of the time. I saw about 15% of elapsed time,
> 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
> into what is essentially two instructions out of the entire backend.
> (Pipeline clears occur when the cache coherency logic detects a memory
> write ordering problem.)
>
> I am not completely clear on why this machine-level bottleneck manifests
> as a lot of context swaps at the OS level. I think what is happening is
> that because SpinLockAcquire is so slow, a process is much more likely
> than you'd normally expect to arrive at SpinLockAcquire while another
> process is also acquiring the spinlock. This puts the two processes
> into a "lockstep" condition where the second process is nearly certain
> to observe the BufMgrLock as locked, and be forced to suspend itself,
> even though the time the first process holds the BufMgrLock is not
> really very long at all.
>
> If you google for Xeon and "cache coherency" you'll find quite a bit of
> suggestive information about why this might be more true on the Xeon
> setup than others. A couple of interesting hits:
>
> http://www.theinquirer.net/?article=10797
> says that Xeon MP uses a *slower* FSB than Xeon DP. This would
> translate directly to more time needed to transfer a dirty cache line
> from one processor to the other, which is the basic operation that we're
> talking about here.
>
> http://www.aceshardware.com/Spades/read.php?article_id=30000187
> says that Opterons use a different cache coherency protocol that is
> fundamentally superior to the Xeon's, because dirty cache data can be
> transferred directly between two processor caches without waiting for
> main memory.
>
> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on. (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too. I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise. So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation. In the presence of heavy contention it won't be.
>
> One thing we probably have got to do soon is break up the BufMgrLock
> into multiple finer-grain locks so that there will be less contention.
> However I am wary of doing this incautiously, because if we do it in a
> way that makes for a significant rise in the number of locks that have
> to be acquired to access a buffer, we might end up with a net loss.
>
> I think Neil Conway was looking into how the bufmgr might be
> restructured to reduce lock contention, but if he had come up with
> anything he didn't mention exactly what. Neil?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>
> !DSPAM:4082feb7326901956819835!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 00:40:35
Message-ID:	87d6647qbg.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on. (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too. I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise. So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation. In the presence of heavy contention it won't be.

There's nothing about the way Postgres spinlocks are coded that affects this?

Is it something the kernel could help with? I've been wondering whether
there's any benefits postgres is missing out on by using its own hand-rolled
locking instead of using the pthreads infrastructure that the kernel is often
involved in.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pg(at)fastcrypt(dot)com
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 02:20:22
Message-ID:	13960.1082341222@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dave Cramer <pg(at)fastcrypt(dot)com> writes:
> So the the kernel/OS is irrelevant here ? this happens on any dual xeon?

I believe so. The context-switch behavior might possibly be a little
more pleasant on other kernels, but the underlying spinlock problem is
not dependent on the kernel.

> What about hypterthreading does it still happen if HTT is turned off ?

The problem comes from keeping the caches synchronized between multiple
physical CPUs. AFAICS enabling HTT wouldn't make it worse, because a
hyperthreaded processor still only has one cache.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 02:30:08
Message-ID:	14056.1082341808@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> There's nothing about the way Postgres spinlocks are coded that affects this?

No. AFAICS our spinlock sequences are pretty much equivalent to the way
the Linux kernel codes its spinlocks, so there's no deep dark knowledge
to be mined there.

We could possibly use some more-efficient blocking mechanism than semop()
once we've decided we have to block (it's a shame Linux still doesn't
have cross-process POSIX semaphores). But the striking thing I learned
from looking at the oprofile results is that most of the inefficiency
comes at the very first TAS() operation, before we've even "spun" let
alone decided we have to block. The s_lock() subroutine does not
account for more than a few percent of the runtime in these tests,
compared to 15% at the inline TAS() operations in LWLockAcquire and
LWLockRelease. I interpret this to mean that once it's acquired
ownership of the cache line, a Xeon can get through the "spinning"
loop in s_lock() mighty quickly.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pg(at)fastcrypt(dot)com
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 03:19:56
Message-ID:	14394.1082344796@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

>> What about hypterthreading does it still happen if HTT is turned off ?

> The problem comes from keeping the caches synchronized between multiple
> physical CPUs. AFAICS enabling HTT wouldn't make it worse, because a
> hyperthreaded processor still only has one cache.

Also, I forgot to say that the numbers I'm quoting *are* with HTT off.

regards, tom lane

From:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To:	josh(at)agliodbs(dot)com
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: RESOLVED: Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 07:27:57
Message-ID:	40837F7D.9050102@aeccom.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh, I cannot reproduce the excessive semop() on a Dual XEON DP on a
non-bigmem kernel, HT on. Interesting to know if the problem is related
to XEON MP (as Tom wrote) or bigmem.

Josh Berkus wrote:

>Dirk,
>
>
>
>>I'm not sure if this semop() problem is still an issue but the database
>>behaves a bit out of bounds in this situation, i.e. consuming system
>>resources with semop() calls 95% while tables are locked very often and
>>longer.
>>
>>
>
>It would be helpful to us if you could test this with the indexes disabled on
>the non-Bigmem system. I'd like to eliminate Bigmem as a factor, if
>possible.
>
>
>

From:	"Sven Geisler" <sgeisler(at)aeccom(dot)com>
To:	<lutzeb(at)aeccom(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgreSQL(dot)org>, "Neil Conway" <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 12:27:44
Message-ID:	004201c42609$b7321390$6402a8c0@andesW2K
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hi Tom,

Just to explain our hardware situation releated to the FSB of the XEON's.
We have older XEON DP in operation with FSB 400 and 2.4 GHz.
The XEON MP box runs with 2.5 GHz.
The XEON MP box is a Fujitsu Siemens Primergy RX600 with ServerWorks GC LE
as chipset.

The box, which Dirk were use to compare the behavior, is our newest XEON DP
system.
This XEON DP box runs with 2.8 GHz and FSB 533 using the Intel 7501 chipset
(Supermicro).

I would agree to Jush. When PostgreSQL has an issue with the INTEL XEON MP
hardware, this is more releated to the chipset.

Back to the SQL-Level. We use SELECT FOR UPDATE as "semaphore".
Should we try another implementation for this semahore on the client side to
prevent this issue?

Regards
Sven.

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: <lutzeb(at)aeccom(dot)com>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-performance(at)postgreSQL(dot)org>;
"Neil Conway" <neilc(at)samurai(dot)com>
Sent: Sunday, April 18, 2004 11:47 PM
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

> After some further digging I think I'm starting to understand what's up
> here, and the really fundamental answer is that a multi-CPU Xeon MP box
> sucks for running Postgres.
>
> I did a bunch of oprofile measurements on a machine belonging to one of
> Josh's clients, using a test case that involved heavy concurrent access
> to a relatively small amount of data (little enough to fit into Postgres
> shared buffers, so that no I/O or kernel calls were really needed once
> the test got going). I found that by nearly any measure --- elapsed
> time, bus transactions, or machine-clear events --- the spinlock
> acquisitions associated with grabbing and releasing the BufMgrLock took
> an unreasonable fraction of the time. I saw about 15% of elapsed time,
> 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
> into what is essentially two instructions out of the entire backend.
> (Pipeline clears occur when the cache coherency logic detects a memory
> write ordering problem.)
>
> I am not completely clear on why this machine-level bottleneck manifests
> as a lot of context swaps at the OS level. I think what is happening is
> that because SpinLockAcquire is so slow, a process is much more likely
> than you'd normally expect to arrive at SpinLockAcquire while another
> process is also acquiring the spinlock. This puts the two processes
> into a "lockstep" condition where the second process is nearly certain
> to observe the BufMgrLock as locked, and be forced to suspend itself,
> even though the time the first process holds the BufMgrLock is not
> really very long at all.
>
> If you google for Xeon and "cache coherency" you'll find quite a bit of
> suggestive information about why this might be more true on the Xeon
> setup than others. A couple of interesting hits:
>
> http://www.theinquirer.net/?article=10797
> says that Xeon MP uses a *slower* FSB than Xeon DP. This would
> translate directly to more time needed to transfer a dirty cache line
> from one processor to the other, which is the basic operation that we're
> talking about here.
>
> http://www.aceshardware.com/Spades/read.php?article_id=30000187
> says that Opterons use a different cache coherency protocol that is
> fundamentally superior to the Xeon's, because dirty cache data can be
> transferred directly between two processor caches without waiting for
> main memory.
>
> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on. (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too. I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise. So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation. In the presence of heavy contention it won't be.
>
> One thing we probably have got to do soon is break up the BufMgrLock
> into multiple finer-grain locks so that there will be less contention.
> However I am wary of doing this incautiously, because if we do it in a
> way that makes for a significant rise in the number of locks that have
> to be acquired to access a buffer, we might end up with a net loss.
>
> I think Neil Conway was looking into how the bufmgr might be
> restructured to reduce lock contention, but if he had come up with
> anything he didn't mention exactly what. Neil?
>
> regards, tom lane
>
>

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 12:32:33
Message-ID:	1082377953.1554.77.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Here's an interesting link that suggests that hyperthreading would be
much worse.

http://groups.google.com/groups?q=hyperthreading+dual+xeon+idle&start=10&hl=en&lr=&ie=UTF-8&c2coff=1&selm=aukkonen-FE5275.21093624062003%40shawnews.gv.shawcable.net&rnum=16

another which has some hints as to how it should be handled

http://groups.google.com/groups?q=hyperthreading+dual+xeon+idle&start=10&hl=en&lr=&ie=UTF-8&c2coff=1&selm=u5tl1XD3BHA.2760%40tkmsftngp04&rnum=19
FWIW, I have anecdotal evidence that suggests that this is the case, on
of my clients was seeing very large context switches with HTT turned on,
and without it was much better.

Dave
On Sun, 2004-04-18 at 23:19, Tom Lane wrote:
> >> What about hypterthreading does it still happen if HTT is turned off ?
>
> > The problem comes from keeping the caches synchronized between multiple
> > physical CPUs. AFAICS enabling HTT wouldn't make it worse, because a
> > hyperthreaded processor still only has one cache.
>
> Also, I forgot to say that the numbers I'm quoting *are* with HTT off.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>
>
> !DSPAM:40834781158911062514350!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com
Cc:	pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 17:50:12
Message-ID:	200404191050.12833.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom,

I have 3 reasons for thinking this:
1) the ServerWorks chipset is present in the fully documented cases that we
have of this problem so far. This is notable becuase the SW is notorious
for poor manufacturing quality, so much so that the company that made them is
currently in receivership. These chips were so bad that Dell was forced to
recall several hundred of it's 2650's, where the motherboards caught fire!
2) the main defect of the SW is the NorthBridge, which could conceivably
adversely affect traffic between RAM and the processor cache.
3) XeonMP is a very popular platform thanks to Dell, and we are not seeing
more problem reports than we are.

The other thing I'd like your comment on, Tom, is that Dirk appears to have
reported that when he installed a non-bigmem kernel, the issue went away.
Dirk, is this correct?

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 18:00:01
Message-ID:	26941.1082397601@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> The other thing I'd like your comment on, Tom, is that Dirk appears to have
> reported that when he installed a non-bigmem kernel, the issue went away.
> Dirk, is this correct?

I'd be really surprised if that had anything to do with it. AFAIR
Dirk's test changed more than one variable and so didn't prove a
connection.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 18:32:59
Message-ID:	200404191832.i3JIWxx10909@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus wrote:
> Tom,
>
> > So in the short term I think we have to tell people that Xeon MP is not
> > the most desirable SMP platform to run Postgres on. (Josh thinks that
> > the specific motherboard chipset being used in these machines might
> > share some of the blame too. I don't have any evidence for or against
> > that idea, but it's certainly possible.)
>
> I have 3 reasons for thinking this:
> 1) the ServerWorks chipset is present in the fully documented cases that we
> have of this problem so far. This is notable becuase the SW is notorious
> for poor manufacturing quality, so much so that the company that made them is
> currently in receivership. These chips were so bad that Dell was forced to
> recall several hundred of it's 2650's, where the motherboards caught fire!
> 2) the main defect of the SW is the NorthBridge, which could conceivably
> adversely affect traffic between RAM and the processor cache.
> 3) XeonMP is a very popular platform thanks to Dell, and we are not seeing
> more problem reports than we are.
>
> The other thing I'd like your comment on, Tom, is that Dirk appears to have
> reported that when he installed a non-bigmem kernel, the issue went away.

I have BSD on a SuperMicro dual Xeon, so if folks want another
hardware/OS combination to test, I can give out logins to my machine.

http://candle.pha.pa.us/main/hardware.html

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

From:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <lutzeb(at)aeccom(dot)com>, <pgsql-performance(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 20:12:32
Message-ID:	Pine.LNX.4.33.0404191410550.17372-100000@css120.ihs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 19 Apr 2004, Bruce Momjian wrote:

> Josh Berkus wrote:
> > Tom,
> >
> > > So in the short term I think we have to tell people that Xeon MP is not
> > > the most desirable SMP platform to run Postgres on. (Josh thinks that
> > > the specific motherboard chipset being used in these machines might
> > > share some of the blame too. I don't have any evidence for or against
> > > that idea, but it's certainly possible.)
> >
> > I have 3 reasons for thinking this:
> > 1) the ServerWorks chipset is present in the fully documented cases that we
> > have of this problem so far. This is notable becuase the SW is notorious
> > for poor manufacturing quality, so much so that the company that made them is
> > currently in receivership. These chips were so bad that Dell was forced to
> > recall several hundred of it's 2650's, where the motherboards caught fire!
> > 2) the main defect of the SW is the NorthBridge, which could conceivably
> > adversely affect traffic between RAM and the processor cache.
> > 3) XeonMP is a very popular platform thanks to Dell, and we are not seeing
> > more problem reports than we are.
> >
> > The other thing I'd like your comment on, Tom, is that Dirk appears to have
> > reported that when he installed a non-bigmem kernel, the issue went away.
>
> I have BSD on a SuperMicro dual Xeon, so if folks want another
> hardware/OS combination to test, I can give out logins to my machine.

I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon
machine as well. It's a Dell 2600 series machine and very fast. It has
the moderately fast 533MHz FSB so may not have as many problems as the MP
type CPUs seem to be having.

From:	"Aaron Werman" <awerman(at)hotmail(dot)com>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: possible improvement between G4 and G5
Date:	2004-04-19 20:41:22
Message-ID:	BAY9-DAV33gsmDzYbB100044774@hotmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

There are a few things that you can do to help force yourself to be I/O
bound. These include:

- RAID 5 for write intensive applications, since multiple writes per synch
write is good. (There is a special case for logging or other streaming
sequential writes on RAID 5)

- Data journaling file systems are helpful in stress testing your
checkpoints

- Using midsized battery backed up write through buffering controllers. In
general, if you have a small cache, you see the problem directly, and a huge
cache will balance out load and defer writes to quieter times. That is why a
midsized cache is so useful in showing stress in your system only when it is
being stressed.

Only partly in jest,
/Aaron

BTW - I am truly curious about what happens to your system if you use
separate RAID 0+1 for your logs, disk sorts, and at least the most active
tables. This should reduce I/O load by an order of magnitude.

"Vivek Khera" <khera(at)kcilink(dot)com> wrote in message
news:x7smez7tqj(dot)fsf(at)yertle(dot)int(dot)kciLink(dot)com(dot)(dot)(dot)
> >>>>> "JB" == Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
> JB> Aaron,
> >> I do consulting, so they're all over the place and tend to be complex.
Very
> >> few fit in RAM, but still are very buffered. These are almost all
backed
> >> with very high end I/O subsystems, with dozens of spindles with battery
> >> backed up writethrough cache and gigs of buffers, which may be why I
worry
> >> so much about CPU. I have had this issue with multiple servers.
>
> JB> Aha, I think this is the difference. I never seem to be able to
> JB> get my clients to fork out for adequate disk support. They are
> JB> always running off single or double SCSI RAID in the host server;
> JB> not the sort of setup you have.
>
> Even when I upgraded my system to a 14-spindle RAID5 with 128M cache
> and 4GB RAM on a dual Xeon system, I still wind up being I/O bound
> quite often.
>
> I think it depends on what your "working set" turns out to be. My
> workload really spans a lot more of the DB than I can end up caching.
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D. Khera Communications, Inc.
> Internet: khera(at)kciLink(dot)com Rockville, MD +1-301-869-4449 x806
> AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 21:02:27
Message-ID:	40843E63.7050101@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

scott.marlowe wrote:
> On Mon, 19 Apr 2004, Bruce Momjian wrote:
>>I have BSD on a SuperMicro dual Xeon, so if folks want another
>>hardware/OS combination to test, I can give out logins to my machine.
>
> I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon
> machine as well. It's a Dell 2600 series machine and very fast. It has
> the moderately fast 533MHz FSB so may not have as many problems as the MP
> type CPUs seem to be having.

I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does
anyone have a test set that can reliably reproduce the problem?

Joe

From:	Dirk(dot)Lutzebaeck(at)t-online(dot)de (Dirk Lutzebaeck)
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 21:18:27
Message-ID:	40844223.9070200@aeccom.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I would agree to Tom, that too much parameters are involved to blame
bigmem. I have access to the following machines where the same
application operates:

a) Dual (4way) XEON MP, bigmem, HT off, ServerWorks chipset (a
Fujitsu-Siemens Primergy)

performs ok now because missing indexes were added but this is no proof
that this behaviour occurs again under high load, context switches are
moderate but have peaks to 40.000

b) Dual XEON DP, non-bigmem, HT on, ServerWorks chipset (a Dell machine
I think)

performs moderate because I see too much context switches here although
the mentioned indexes are created, context switches go up to 30.000
often, I can see 50% semop calls

c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)

performs well and I could not observe context switch peaks here (one
user active), almost no extra semop calls

d) Dual XEON DP, bigmem, HT off, ServerWorks chipset (a Fujitsu-Siemens
Primergy)

performance unknown at the moment (is offline) but looks like a) in the past

I can offer to do tests on those machines if somebody would provide me
some test instructions to nail this problem down.

Dirk

Tom Lane wrote:

>Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
>
>>The other thing I'd like your comment on, Tom, is that Dirk appears to have
>>reported that when he installed a non-bigmem kernel, the issue went away.
>>Dirk, is this correct?
>>
>>
>
>I'd be really surprised if that had anything to do with it. AFAIR
>Dirk's test changed more than one variable and so didn't prove a
>connection.
>
> regards, tom lane
>
>
>

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 21:55:04
Message-ID:	200404191455.04068.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Joe,

> I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does
> anyone have a test set that can reliably reproduce the problem?

Unfortunately we can't seem to come up with one. So far we have 2 machines
that exhibit the issue, and their databases are highly confidential (State of
WA education data).

It does seem to require a database which is in the many GB (> 10GB), and a
situation where a small subset of the data is getting hit repeatedly by
multiple processes. So you could try your own data warehouse, making sure
that you have at least 4 connections hitting one query after another.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-19 22:55:34
Message-ID:	29784.1082415334@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does
>> anyone have a test set that can reliably reproduce the problem?

> Unfortunately we can't seem to come up with one.

> It does seem to require a database which is in the many GB (> 10GB), and a
> situation where a small subset of the data is getting hit repeatedly by
> multiple processes.

I do not think a large database is actually necessary; the test case
Josh's client has is only hitting a relatively small amount of data.
The trick seems to be to cause lots and lots of ReadBuffer/ReleaseBuffer
activity without much else happening, and to do this from multiple
backends concurrently.

I believe the best way to make this happen is a lot of relatively simple
(but not short) indexscan queries that in aggregate touch just a bit
less than shared_buffers worth of data. I have not tried to make a
self-contained test case, but based on what I know now I think it should
be possible.

I'll give this a shot later tonight --- it does seem that trying to
reproduce the problem on different kinds of hardware is the next useful
step we can take.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 00:01:56
Message-ID:	1407.1082419316@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Here is a test case. To set up, run the "test_setup.sql" script once;
then launch two copies of the "test_run.sql" script. (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.) Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up. On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.

What I actually see on Josh's client's machine is a context swap storm:
"vmstat 1" shows CS rates around 170K/sec. strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in. top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.

I haven't bothered to check how long the test_run query takes, but if it
ends while you're still examining the behavior, just start it again.

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.

regards, tom lane

Attachment	Content-Type	Size
unknown_filename	text/plain	1.1 KB
unknown_filename	text/plain	309 bytes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 00:53:09
Message-ID:	1931.1082422389@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I wrote:
> Here is a test case.

Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook. Anybody
got a multiple Opteron to try? Totally non-Intel CPUs?

It would be interesting to see results with non-Linux kernels, too.

regards, tom lane

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 03:00:05
Message-ID:	40849235.2070808@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Here is a test case. To set up, run the "test_setup.sql" script once;
> then launch two copies of the "test_run.sql" script. (For those of
> you with more than two CPUs, see whether you need one per CPU to make
> trouble, or whether two test_runs are enough.) Check that you get a
> nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

Check.

> In isolation, test_run.sql should do essentially no syscalls at all once
> it's past the initial ramp-up. On a machine that's functioning per
> expectations, multiple copies of test_run show a relatively low rate of
> semop() calls --- a few per second, at most --- and maybe a delaying
> select() here and there.
>
> What I actually see on Josh's client's machine is a context swap storm:
> "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends
> shows a corresponding rate of semop() syscalls, with a few delaying
> select()s sprinkled in. top(1) shows system CPU percent of 25-30
> and idle CPU percent of 16-20.

Your test case works perfectly. I ran 4 concurrent psql sessions, on a
quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top'
looks like:

177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 35.9% 0.0% 7.2% 0.0% 0.0% 0.0% 56.8%
cpu00 19.6% 0.0% 4.9% 0.0% 0.0% 0.0% 75.4%
cpu01 44.1% 0.0% 7.8% 0.0% 0.0% 0.0% 48.0%
cpu02 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0%
cpu03 32.3% 0.0% 13.7% 0.0% 0.0% 0.0% 53.9%
cpu04 21.5% 0.0% 10.7% 0.0% 0.0% 0.0% 67.6%
cpu05 42.1% 0.0% 9.8% 0.0% 0.0% 0.0% 48.0%
cpu06 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
cpu07 27.4% 0.0% 10.7% 0.0% 0.0% 0.0% 61.7%
Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff
2492124k actv, 760612k in_d, 41416k in_c
Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached

Note that cpu06 is not a postgres process. The output of vmstat looks
like this:

# vmstat 1
procs memory swap io system
cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
4 0 5632 184264 221948 3113308 0 0 0 0 0 0 0 0 0 0
3 0 5632 184264 221948 3113308 0 0 0 0 112 211894 36 9 55 0
5 0 5632 184264 221948 3113308 0 0 0 0 125 222071 39 8 53 0
4 0 5632 184264 221948 3113308 0 0 0 0 110 215097 39 10 52 0
1 0 5632 184588 221948 3113308 0 0 0 96 139 187561 35 10 55 0
3 0 5632 184588 221948 3113308 0 0 0 0 114 241731 38 10 52 0
3 0 5632 184920 221948 3113308 0 0 0 0 132 257168 40 9 51 0
1 0 5632 184912 221948 3113308 0 0 0 0 114 251802 38 9 54 0

> Note the test case assumes you've got shared_buffers set to at least
> 1000; with smaller values, you may get some I/O syscalls, which will
> probably skew the results.

shared_buffers
----------------
16384
(1 row)

I found that killing three of the four concurrent queries dropped
context switches to about 70,000 to 100,000. Two or more sessions brings
it up to 200K+.

Joe

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 03:47:02
Message-ID:	20040419214702.70e5c9b6@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Mon, 19 Apr 2004 20:53:09 -0400),
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> confessed:

Same problem on my dual AMD MP with 2.6.5 kernel using two sessions of your
test, but maybe not quite as severe. The highest CS values I saw was 102k, with
some non-db number crunching going on in parallel with the test. 'Average'
about 80k with two instances. Using the anticipatory scheduler.

A single instance pulls in around 200-300 CS, and no tests running around
200-300 CS (i.e. no CS difference).

A snipet:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
3 0 284 90624 93452 1453740 0 0 0 0 1075 76548 83 17 0 0
6 0 284 125312 93452 1470196 0 0 0 0 1073 87702 78 22 0 0
3 0 284 178392 93460 1420208 0 0 76 298 1083 67721 77 24 0 0
4 0 284 177120 93460 1421500 0 0 1104 0 1054 89593 80 21 0 0
5 0 284 173504 93460 1425172 0 0 3584 0 1110 65536 81 19 0 0
4 0 284 169984 93460 1428708 0 0 3456 0 1098 66937 81 20 0 0
6 0 284 170944 93460 1428708 0 0 8 0 1045 66065 81 19 0 0
6 0 284 167288 93460 1428776 0 0 0 8 1097 75560 81 19 0 0
6 0 284 136296 93460 1458356 0 0 0 0 1036 80808 75 26 0 0
5 0 284 132864 93460 1461688 0 0 0 0 1007 76071 84 17 0 0
4 0 284 132880 93460 1461688 0 0 0 0 1079 86903 82 18 0 0
5 0 284 132880 93460 1461688 0 0 0 0 1078 79885 83 17 0 0
6 0 284 132648 93460 1461688 0 0 0 760 1228 66564 86 14 0 0
6 0 284 132648 93460 1461688 0 0 0 0 1047 69741 86 15 0 0
6 0 284 132672 93460 1461688 0 0 0 0 1057 79052 84 16 0 0
5 0 284 132672 93460 1461688 0 0 0 0 1054 81109 82 18 0 0
5 0 284 132736 93460 1461688 0 0 0 0 1043 91725 80 20 0 0

Cheers,
Rob

--
21:33:03 up 3 days, 1:10, 3 users, load average: 5.05, 4.67, 4.22
Linux 2.6.5-01 #5 SMP Tue Apr 6 21:32:39 MDT 2004

From:	jelle <jellej(at)pacbell(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 05:18:21
Message-ID:	Pine.LNX.4.44.0404192211440.4079-100000@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Same problem with dual 1Ghz P3's running Postgres 7.4.2, linux 2.4.x, and
2GB ram, under load, with long transactions (i.e. 1 "cannot serialize"
rollback per minute). 200K was the worst observed with vmstat.

Finally moved DB to a single xeon box.

From:	ohp(at)pyrenet(dot)fr
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 10:35:50
Message-ID:	Pine.UW2.4.53.0404201234340.21865@server.pyrenet.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hi Tom,

You still have an account on my Unixware Bi-Xeon hyperthreded machine.
Feel free to use it for your tests.
On Mon, 19 Apr 2004, Tom Lane wrote:

> Date: Mon, 19 Apr 2004 20:53:09 -0400
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: josh(at)agliodbs(dot)com
> Cc: Joe Conway <mail(at)joeconway(dot)com>, scott.marlowe <scott(dot)marlowe(at)ihs(dot)com>,
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com,
> pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
> Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
>
> I wrote:
> > Here is a test case.
>
> Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
> which seems to pretty much let the Xeon per se off the hook. Anybody
> got a multiple Opteron to try? Totally non-Intel CPUs?
>
> It would be interesting to see results with non-Linux kernels, too.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp(at)pyrenet(dot)fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

From:	Jeff <threshar(at)torgo(dot)978(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, josh(at)agliodbs(dot)com, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 12:46:12
Message-ID:	B41A54FF-92C8-11D8-A8BE-000D9366F0C4@torgo.978.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Apr 19, 2004, at 8:01 PM, Tom Lane wrote:
[test case]

Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second
2 process: 100k cs / sec
3 process: 140k cs / sec
8 process: 115k cs / sec

Dual P2-450Mhz, non-serverworks (piix) - 1 process 15-20 / sec
2 process 30k / sec
3 (up to 7) process: 15k /sec

(Yes, I verified with more processes the cs's drop)

And finally,

6 cpu sun e4500, solaris 2.6, pg 7.4.2: 1 - 10 processes: hovered
between 2-3k cs/second (there was other stuff running on the machine as
well)

Verrry interesting.
I've got a dual G4 at home, but for convenience Apple doesn't ship a
vmstat that tells context switches

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Jeff <threshar(at)torgo(dot)978(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 13:06:59
Message-ID:	1082466419.3069.132.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dual Athlon

With one process running 30 cs/second
with two process running 15000 cs/second

Dave
On Tue, 2004-04-20 at 08:46, Jeff wrote:
> On Apr 19, 2004, at 8:01 PM, Tom Lane wrote:
> [test case]
>
> Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second
> 2 process: 100k cs / sec
> 3 process: 140k cs / sec
> 8 process: 115k cs / sec
>
> Dual P2-450Mhz, non-serverworks (piix) - 1 process 15-20 / sec
> 2 process 30k / sec
> 3 (up to 7) process: 15k /sec
>
> (Yes, I verified with more processes the cs's drop)
>
> And finally,
>
> 6 cpu sun e4500, solaris 2.6, pg 7.4.2: 1 - 10 processes: hovered
> between 2-3k cs/second (there was other stuff running on the machine as
> well)
>
>
> Verrry interesting.
> I've got a dual G4 at home, but for convenience Apple doesn't ship a
> vmstat that tells context switches
>
> --
> Jeff Trout <jeff(at)jefftrout(dot)com>
> http://www.jefftrout.com/
> http://www.stuarthamm.net/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>
>
> !DSPAM:40851da1199651145780980!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	"Matt Clark" <matt(at)ymogen(dot)net>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <josh(at)agliodbs(dot)com>
Cc:	"Joe Conway" <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <lutzeb(at)aeccom(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Neil Conway" <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 13:44:40
Message-ID:	OAEAKHEHCMLBLIDGAFELCEMNFFAA.matt@ymogen.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), Quad Xeon 700MHz/1MB L2 cache, 3GB RAM.

Idle-ish (it's a production server) cs/sec ~5000

3 test queries running:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
3 0 0 23380 577680 105912 2145140 0 0 0 0 107 116890 50 14 35
2 0 0 23380 577680 105912 2145140 0 0 0 0 114 118583 50 15 34
2 0 0 23380 577680 105912 2145140 0 0 0 0 107 115842 54 14 32
2 1 0 23380 577680 105920 2145140 0 0 0 32 156 117549 50 16 35

HTH

Matt

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of Tom Lane
> Sent: 20 April 2004 01:02
> To: josh(at)agliodbs(dot)com
> Cc: Joe Conway; scott.marlowe; Bruce Momjian; lutzeb(at)aeccom(dot)com;
> pgsql-performance(at)postgresql(dot)org; Neil Conway
> Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
>
>
> Here is a test case. To set up, run the "test_setup.sql" script once;
> then launch two copies of the "test_run.sql" script. (For those of
> you with more than two CPUs, see whether you need one per CPU to make
> trouble, or whether two test_runs are enough.) Check that you get a
> nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
>
> In isolation, test_run.sql should do essentially no syscalls at all once
> it's past the initial ramp-up. On a machine that's functioning per
> expectations, multiple copies of test_run show a relatively low rate of
> semop() calls --- a few per second, at most --- and maybe a delaying
> select() here and there.
>
> What I actually see on Josh's client's machine is a context swap storm:
> "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends
> shows a corresponding rate of semop() syscalls, with a few delaying
> select()s sprinkled in. top(1) shows system CPU percent of 25-30
> and idle CPU percent of 16-20.
>
> I haven't bothered to check how long the test_run query takes, but if it
> ends while you're still examining the behavior, just start it again.
>
> Note the test case assumes you've got shared_buffers set to at least
> 1000; with smaller values, you may get some I/O syscalls, which will
> probably skew the results.
>
> regards, tom lane
>
>

From:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 14:29:01
Message-ID:	408533AD.40904@aeccom.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dirk Lutzebaeck wrote:

> c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
>
> performs well and I could not observe context switch peaks here (one
> user active), almost no extra semop calls

Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to
300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks
chipset here nor bigmem.

Dirk

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 16:48:14
Message-ID:	200404201648.i3KGmEV27394@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dirk Lutzebck wrote:
> Dirk Lutzebaeck wrote:
>
> > c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
> >
> > performs well and I could not observe context switch peaks here (one
> > user active), almost no extra semop calls
>
> Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to
> 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks
> chipset here nor bigmem.

Dave Cramer reproduced the problem on my SuperMicro dual Xeon on BSD/OS.

From:	"J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 17:17:22
Message-ID:	1082481442.997.4.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I verified problem on a Dual Opteron server. I temporarily killed the
normal load, so the server was largely idle when the test was run.

Hardware:
2x Opteron 242
Rioworks HDAMA server board
4Gb RAM

OS Kernel:
RedHat9 + XFS

1 proc: 10-15 cs/sec
2 proc: 400,000-420,000 cs/sec

j. andrew rogers

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgreSQL(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 17:58:18
Message-ID:	200404201058.18840.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dirk, Tom,

OK, off IRC, I have the following reports:

Linux 2.4.21 or 2.4.20 on dual Pentium III : problem verified
Linux 2.4.21 or 2.4.20 on dual Penitum II : problem cannot be reproduced
Solaris 2.6 on 6 cpu e4500 (using 8 processes) : problem not reproduced

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, Postgresql Performance <pgsql-performance(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 19:48:50
Message-ID:	1082490529.80320.71.camel@jester
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> It would be interesting to see results with non-Linux kernels, too.

Dual Celeron 500Mhz (Abit BP6 mobo) - client & server on same machine

2 processes FreeBSD (5.2.1): 1800cs
3 processes FreeBSD: 14000cs
4 processes FreeBSD: 14500cs

2 processes Linux (2.4.18 kernel): 52000cs
3 processes Linux: 10000cs
4 processes Linux: 20000cs

From:	Paul Tuckfield <paul(at)tuckfield(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 20:02:43
Message-ID:	AF7EFDBC-9305-11D8-BA67-000393BD6C3E@tuckfield.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I tried to test how this is related to cache coherency, by forcing
affinity of the two test_run.sql processes to the two cores (pipelines?
threads) of a single hyperthreaded xeon processor in an smp xeon box.

When the processes are allowed to run on distinct chips in the smp box,
the CS storm happens. When they are "bound" to the two cores of a
single hyperthreaded Xeon in the smp box, the CS storm *does* happen.

I used the taskset command:
taskset 01 -p <pid for backend of test_run.sql 1>
taskset 01 -p <pid for backend of test_run.sql 1>

I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on
the first Xeon processor in the box.

I did this on RedHat Fedora core1 on an intel motherboard (I'll get the
part no if it matters)

during storms : 300k CS/sec, 75% idle (on a dual xeon (four core))
machine (suggesting serializing/sleeping processes)
no storm: 50k CS/sec, 50% idle (suggesting 2 cpu bound processes)

Maybe there's a "hot block" that is bouncing back and forth between
caches? or maybe the page holding semaphores?

On Apr 19, 2004, at 5:53 PM, Tom Lane wrote:

> I wrote:
>> Here is a test case.
>
> Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
> which seems to pretty much let the Xeon per se off the hook. Anybody
> got a multiple Opteron to try? Totally non-Intel CPUs?
>
> It would be interesting to see results with non-Linux kernels, too.
>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

From:	"Magnus Naeslund(t)" <mag(at)fbab(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-20 22:47:49
Message-ID:	4085A895.6080403@fbab.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
>
> Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
> which seems to pretty much let the Xeon per se off the hook. Anybody
> got a multiple Opteron to try? Totally non-Intel CPUs?
>
> It would be interesting to see results with non-Linux kernels, too.
>
> regards, tom lane

I also tested on an dual Athlon MP Tyan Thunder motherboard (2xMP2800+,
2.5GB memory), and got the same high numbers.
I then ran with kernel 2.6.5, it lowered them a little, but it's still
some ping pong effect here. I wonder if this is some effect of the
scheduler, maybe the shed frequency alone (100HZ vs 1000HZ).

It would be interesting to see what a locking implementation ala FUTEX
style would give on an 2.6 kernel, as i understood it that would work
cross process with some work.

The first file attached is kernel 2.4 running one process then starting
up the other one.
Same with second file, but with kernel 2.6...

Regards
Magnus

Attachment	Content-Type	Size
vmstat_1-1	text/plain	2.2 KB
vmstat_1-2	text/plain	2.0 KB

From:	Paul Tuckfield <paul(at)tuckfield(dot)com>
To:	Paul Tuckfield <paul(at)tuckfield(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 00:34:13
Message-ID:	9D274446-932B-11D8-BA67-000393BD6C3E@tuckfield.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Ooops, what I meant to say was that 2 threads bound to one
(hyperthreaded) cpu does *NOT* cause the storm, even on an smp xeon.

Therefore, the context switches may be a result of cache coherency
related delays. (2 threads on one hyperthreaded cpu presumably have
tightly coupled 1,l2 cache.)

On Apr 20, 2004, at 1:02 PM, Paul Tuckfield wrote:

> I tried to test how this is related to cache coherency, by forcing
> affinity of the two test_run.sql processes to the two cores
> (pipelines? threads) of a single hyperthreaded xeon processor in an
> smp xeon box.
>
> When the processes are allowed to run on distinct chips in the smp
> box, the CS storm happens. When they are "bound" to the two cores of
> a single hyperthreaded Xeon in the smp box, the CS storm *does*
> happen.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ er, meant *NOT HAPPEN*
>
>
>
> I used the taskset command:
> taskset 01 -p <pid for backend of test_run.sql 1>
> taskset 01 -p <pid for backend of test_run.sql 1>
>
> I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on
> the first Xeon processor in the box.
>
> I did this on RedHat Fedora core1 on an intel motherboard (I'll get
> the part no if it matters)
>
> during storms : 300k CS/sec, 75% idle (on a dual xeon (four core))
> machine (suggesting serializing/sleeping processes)
> no storm: 50k CS/sec, 50% idle (suggesting 2 cpu bound processes)
>
>
> Maybe there's a "hot block" that is bouncing back and forth between
> caches? or maybe the page holding semaphores?
>
> On Apr 19, 2004, at 5:53 PM, Tom Lane wrote:
>
>> I wrote:
>>> Here is a test case.
>>
>> Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
>> which seems to pretty much let the Xeon per se off the hook. Anybody
>> got a multiple Opteron to try? Totally non-Intel CPUs?
>>
>> It would be interesting to see results with non-Linux kernels, too.
>>
>> regards, tom lane
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 03:46:58
Message-ID:	4085EEB2.5080007@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Joe Conway wrote:
>> In isolation, test_run.sql should do essentially no syscalls at all once
>> it's past the initial ramp-up. On a machine that's functioning per
>> expectations, multiple copies of test_run show a relatively low rate of
>> semop() calls --- a few per second, at most --- and maybe a delaying
>> select() here and there.

Here's results for 7.4 on a dual Athlon server running fedora core:

CPU states: cpu user nice system irq softirq iowait idle
total 86.0% 0.0% 52.4% 0.0% 0.0% 0.0% 61.2%
cpu00 37.6% 0.0% 29.7% 0.0% 0.0% 0.0% 32.6%
cpu01 48.5% 0.0% 22.7% 0.0% 0.0% 0.0% 28.7%

procs memory swap io system
cpu
r b swpd free buff cache si so bi bo in cs
1 0 120448 25764 48300 1094576 0 0 0 124 170 187
1 0 120448 25780 48300 1094576 0 0 0 0 152 89
2 0 120448 25744 48300 1094580 0 0 0 60 141 78290
2 0 120448 25752 48300 1094580 0 0 0 0 131 140326
2 0 120448 25756 48300 1094576 0 0 0 40 122 140100
2 0 120448 25764 48300 1094584 0 0 0 60 133 136595
2 0 120448 24284 48300 1094584 0 0 0 200 138 135151

The jump in cs corresponds to starting the query in the second session.

Joe

From:	ohp(at)pyrenet(dot)fr
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	josh(at)agliodbs(dot)com, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 11:18:53
Message-ID:	Pine.UW2.4.53.0404211315000.9232@server.pyrenet.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

How long is this test supposed to run?

I've launched just 1 for testing, the plan seems horrible; the test is cpu
bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G
Unixware 713

The machine is a Fujitsu-Siemens TX 200 server
On Mon, 19 Apr 2004, Tom Lane wrote:

> Date: Mon, 19 Apr 2004 20:01:56 -0400
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: josh(at)agliodbs(dot)com
> Cc: Joe Conway <mail(at)joeconway(dot)com>, scott.marlowe <scott(dot)marlowe(at)ihs(dot)com>,
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com,
> pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
> Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
>
> Here is a test case. To set up, run the "test_setup.sql" script once;
> then launch two copies of the "test_run.sql" script. (For those of
> you with more than two CPUs, see whether you need one per CPU to make
> trouble, or whether two test_runs are enough.) Check that you get a
> nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
>
> In isolation, test_run.sql should do essentially no syscalls at all once
> it's past the initial ramp-up. On a machine that's functioning per
> expectations, multiple copies of test_run show a relatively low rate of
> semop() calls --- a few per second, at most --- and maybe a delaying
> select() here and there.
>
> What I actually see on Josh's client's machine is a context swap storm:
> "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends
> shows a corresponding rate of semop() syscalls, with a few delaying
> select()s sprinkled in. top(1) shows system CPU percent of 25-30
> and idle CPU percent of 16-20.
>
> I haven't bothered to check how long the test_run query takes, but if it
> ends while you're still examining the behavior, just start it again.
>
> Note the test case assumes you've got shared_buffers set to at least
> 1000; with smaller values, you may get some I/O syscalls, which will
> probably skew the results.
>
> regards, tom lane
>
>

From:	Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To:	ohp(at)pyrenet(dot)fr
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 12:10:55
Message-ID:	408664CF.9040507@aeccom.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

It is intended to run indefinately.

Dirk

ohp(at)pyrenet(dot)fr wrote:

>How long is this test supposed to run?
>
>I've launched just 1 for testing, the plan seems horrible; the test is cpu
>bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G
>Unixware 713
>
>The machine is a Fujitsu-Siemens TX 200 server
>
>

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>
Cc:	ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 15:05:31
Message-ID:	1082559931.1557.235.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

After some testing if you use the current head code for s_lock.c which
has some mods in it to alleviate this situation, and change
SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
I am seeing a slight degradation in throughput using pgbench -c 10 -t
1000 but it might be liveable, considering the alternative is unbearable
in some situations.

Can anyone else replicate my results?

Dave
On Wed, 2004-04-21 at 08:10, Dirk_Lutzebäck wrote:
> It is intended to run indefinately.
>
> Dirk
>
> ohp(at)pyrenet(dot)fr wrote:
>
> >How long is this test supposed to run?
> >
> >I've launched just 1 for testing, the plan seems horrible; the test is cpu
> >bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G
> >Unixware 713
> >
> >The machine is a Fujitsu-Siemens TX 200 server
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>
> !DSPAM:40866735106778584283649!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>
Cc:	ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-21 17:29:43
Message-ID:	200404211029.43675.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Dave,

> After some testing if you use the current head code for s_lock.c which
> has some mods in it to alleviate this situation, and change
> SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
> I am seeing a slight degradation in throughput using pgbench -c 10 -t
> 1000 but it might be liveable, considering the alternative is unbearable
> in some situations.
>
> Can anyone else replicate my results?

Can you produce a patch against 7.4.1? I'd like to test your fix against a
real-world database.

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Paul Tuckfield <paul(at)tuckfield(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-22 03:10:43
Message-ID:	18620.1082603443@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Paul Tuckfield <paul(at)tuckfield(dot)com> writes:
>> I used the taskset command:
>> taskset 01 -p <pid for backend of test_run.sql 1>
>> taskset 01 -p <pid for backend of test_run.sql 1>
>>
>> I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on
>> the first Xeon processor in the box.

AFAICT, what you've actually done here is to bind both backends to the
first logical processor of the first Xeon. If you'd used 01 and 02
as the affinity masks then you'd have bound them to the two cores of
that Xeon, but what you actually did simply reduces the system to a
uniprocessor. In that situation the context swap rate will be normally
one swap per scheduler timeslice, and at worst two swaps per timeslice
(if a process is swapped away from while it holds a lock the other one
wants). It doesn't prove a lot about our SMP problem though.

I don't have access to a Xeon with both taskset and hyperthreading
enabled, so I can't check what happens when you do the taskset correctly
... could you retry?

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	"Magnus Naeslund(t)" <mag(at)fbab(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-26 19:20:58
Message-ID:	200404261220.58257.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Magus,

> It would be interesting to see what a locking implementation ala FUTEX
> style would give on an 2.6 kernel, as i understood it that would work
> cross process with some work.

I'mm working on testing a FUTEX patch, but am having some trouble with it.
Will let you know the results ....

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-29 00:57:53
Message-ID:	20040428185753.56614b2c@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
Josh Berkus <josh(at)agliodbs(dot)com> confessed:

> Dave,
>
> > After some testing if you use the current head code for s_lock.c which
> > has some mods in it to alleviate this situation, and change
> > SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
> > I am seeing a slight degradation in throughput using pgbench -c 10 -t
> > 1000 but it might be liveable, considering the alternative is unbearable
> > in some situations.
> >
> > Can anyone else replicate my results?
>
> Can you produce a patch against 7.4.1? I'd like to test your fix against a
> real-world database.

I would like to see the same, as I have a system that exhibits the same behavior
on a production db that's running 7.4.1.

Cheers,
Rob

--
18:55:22 up 1:40, 4 users, load average: 2.00, 2.04, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	ohp(at)pyrenet(dot)fr
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-29 13:20:18
Message-ID:	Pine.UW2.4.53.0404291519190.7358@server.pyrenet.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I'd LOVE to contribute on this but I don't have vmstat and I'm not running
linux.

How can I help?
Regards
On Wed, 28 Apr 2004, Robert Creager wrote:

> Date: Wed, 28 Apr 2004 18:57:53 -0600
> From: Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
> To: Josh Berkus <josh(at)agliodbs(dot)com>
> Cc: pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr,
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>,
> scott.marlowe <scott(dot)marlowe(at)ihs(dot)com>,
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org,
> Neil Conway <neilc(at)samurai(dot)com>
> Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
>
> When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
> Josh Berkus <josh(at)agliodbs(dot)com> confessed:
>
> > Dave,
> >
> > > After some testing if you use the current head code for s_lock.c which
> > > has some mods in it to alleviate this situation, and change
> > > SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
> > > I am seeing a slight degradation in throughput using pgbench -c 10 -t
> > > 1000 but it might be liveable, considering the alternative is unbearable
> > > in some situations.
> > >
> > > Can anyone else replicate my results?
> >
> > Can you produce a patch against 7.4.1? I'd like to test your fix against a
> > real-world database.
>
> I would like to see the same, as I have a system that exhibits the same behavior
> on a production db that's running 7.4.1.
>
> Cheers,
> Rob
>
>
>

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-04-29 18:21:51
Message-ID:	200404291121.51247.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Rob,

> I would like to see the same, as I have a system that exhibits the same
behavior
> on a production db that's running 7.4.1.

If you checked the thread follow-ups, you'd see that *decreasing*
spins_per_delay was not beneficial. Instead, try increasing them, one step
at a time:

(take baseline measurement at 100)
250
500
1000
1500
2000
3000
5000

... until you find an optimal level. Then report the results to us!

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	josh(at)agliodbs(dot)com
Cc:	pg(at)fastcrypt(dot)com, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-01 04:03:06
Message-ID:	20040430220306.15d95162@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
Josh Berkus <josh(at)agliodbs(dot)com> confessed:

> spins_per_delay was not beneficial. Instead, try increasing them, one step
> at a time:
>
> (take baseline measurement at 100)
> 250
> 500
> 1000
> 1500
> 2000
> 3000
> 5000
>
> ... until you find an optimal level. Then report the results to us!
>

Some results. The patch mentioned is what Dave Cramer posted to the Performance
list on 4/21.

A Perl script monitored <vmstat 1> for 120 seconds and generated max and average
values. Unfortunately, I am not present on site, so I cannot physically change
the device under test to increase the db load to where it hit about 10 days ago.
That will have to wait till the 'real' work week on Monday.

Context switches - avg max

Default 7.4.1 code : 10665 69470
Default patch - 10 : 17297 21929
patch at 100 : 26825 87073
patch at 1000 : 37580 110849

Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
should the numbers be trending the way they are with the patched code? Or will
these numbers potentially change dramatically when I can load up the db?

And, presuming I can re-produce what I was seeing previously (200K CS/s), you
folks want me to carry on with more testing of the patch and report the results?
Or just go away and be quiet...

The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
(with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
drive, RAID 0+1 on 4x 10K RPM drives for data. The only job this box has is
running this db.

Cheers,
Rob

--
21:54:48 up 2 days, 4:39, 4 users, load average: 2.00, 2.03, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-01 18:50:47
Message-ID:	1083437446.25697.109.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

No, don't go away and be quiet. Keep testing, it may be that under
normal operation the context switching goes up but under the conditions
that you were seeing the high CS it may not be as bad.

As others have mentioned the real solution to this is to rewrite the
buffer management so that the lock isn't quite as coarse grained.

Dave
On Sat, 2004-05-01 at 00:03, Robert Creager wrote:
> When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
> Josh Berkus <josh(at)agliodbs(dot)com> confessed:
>
> > spins_per_delay was not beneficial. Instead, try increasing them, one step
> > at a time:
> >
> > (take baseline measurement at 100)
> > 250
> > 500
> > 1000
> > 1500
> > 2000
> > 3000
> > 5000
> >
> > ... until you find an optimal level. Then report the results to us!
> >
>
> Some results. The patch mentioned is what Dave Cramer posted to the Performance
> list on 4/21.
>
> A Perl script monitored <vmstat 1> for 120 seconds and generated max and average
> values. Unfortunately, I am not present on site, so I cannot physically change
> the device under test to increase the db load to where it hit about 10 days ago.
> That will have to wait till the 'real' work week on Monday.
>
> Context switches - avg max
>
> Default 7.4.1 code : 10665 69470
> Default patch - 10 : 17297 21929
> patch at 100 : 26825 87073
> patch at 1000 : 37580 110849
>
> Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
> should the numbers be trending the way they are with the patched code? Or will
> these numbers potentially change dramatically when I can load up the db?
>
> And, presuming I can re-produce what I was seeing previously (200K CS/s), you
> folks want me to carry on with more testing of the patch and report the results?
> Or just go away and be quiet...
>
> The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
> (with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
> controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
> drive, RAID 0+1 on 4x 10K RPM drives for data. The only job this box has is
> running this db.
>
> Cheers,
> Rob
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	pg(at)fastcrypt(dot)com
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-02 15:20:47
Message-ID:	20040502092047.029525f6@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Found some co-workers at work yesterday to load up my library...

The sample period is 5 minutes long (vs 2 minutes previously):

Context switches - avg max

Default 7.4.1 code : 48784 107354
Default patch - 10 : 20400 28160
patch at 100 : 38574 85372
patch at 1000 : 41188 106569

The reading at 1000 was not produced under the same circumstances as the prior
readings as I had to replace my device under test with a simulated one. The
real one died.

The previous run with smaller database and 120 second averages:

Context switches - avg max

Default 7.4.1 code : 10665 69470
Default patch - 10 : 17297 21929
patch at 100 : 26825 87073
patch at 1000 : 37580 110849

--
20:13:50 up 3 days, 2:58, 4 users, load average: 2.12, 2.14, 2.10
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	Dave Cramer <pg(at)fastcrypt(dot)com>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-02 15:39:22
Message-ID:	1083512362.25096.131.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Robert,

The real question is does it help under real life circumstances ?

Did you do the tests with Tom's sql code that is designed to create high
context switchs ?

Dave
On Sun, 2004-05-02 at 11:20, Robert Creager wrote:
> Found some co-workers at work yesterday to load up my library...
>
> The sample period is 5 minutes long (vs 2 minutes previously):
>
> Context switches - avg max
>
> Default 7.4.1 code : 48784 107354
> Default patch - 10 : 20400 28160
> patch at 100 : 38574 85372
> patch at 1000 : 41188 106569
>
> The reading at 1000 was not produced under the same circumstances as the prior
> readings as I had to replace my device under test with a simulated one. The
> real one died.
>
> The previous run with smaller database and 120 second averages:
>
> Context switches - avg max
>
> Default 7.4.1 code : 10665 69470
> Default patch - 10 : 17297 21929
> patch at 100 : 26825 87073
> patch at 1000 : 37580 110849
--
Dave Cramer
519 939 0336
ICQ # 14675561

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	pg(at)fastcrypt(dot)com
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-02 21:46:49
Message-ID:	20040502154649.3cb2f283@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Sun, 02 May 2004 11:39:22 -0400),
Dave Cramer <pg(at)fastcrypt(dot)com> confessed:

> Robert,
>
> The real question is does it help under real life circumstances ?

I'm not yet at the point where the CS's are causing appreciable delays. I
should get there early this week and will be able to measure the relief your
patch may provide.

>
> Did you do the tests with Tom's sql code that is designed to create high
> context switchs ?

No, I'm using my queries/data.

Cheers,
Rob

--
10:44:58 up 3 days, 17:30, 4 users, load average: 2.00, 2.04, 2.01
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	pg(at)fastcrypt(dot)com
Cc:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 01:20:20
Message-ID:	200405200120.i4K1KKm06350@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Did we ever come to a conclusion about excessive SMP context switching
under load?

---------------------------------------------------------------------------

Dave Cramer wrote:
> Robert,
>
> The real question is does it help under real life circumstances ?
>
> Did you do the tests with Tom's sql code that is designed to create high
> context switchs ?
>
> Dave
> On Sun, 2004-05-02 at 11:20, Robert Creager wrote:
> > Found some co-workers at work yesterday to load up my library...
> >
> > The sample period is 5 minutes long (vs 2 minutes previously):
> >
> > Context switches - avg max
> >
> > Default 7.4.1 code : 48784 107354
> > Default patch - 10 : 20400 28160
> > patch at 100 : 38574 85372
> > patch at 1000 : 41188 106569
> >
> > The reading at 1000 was not produced under the same circumstances as the prior
> > readings as I had to replace my device under test with a simulated one. The
> > real one died.
> >
> > The previous run with smaller database and 120 second averages:
> >
> > Context switches - avg max
> >
> > Default 7.4.1 code : 10665 69470
> > Default patch - 10 : 17297 21929
> > patch at 100 : 26825 87073
> > patch at 1000 : 37580 110849
> --
> Dave Cramer
> 519 939 0336
> ICQ # 14675561
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 01:59:26
Message-ID:	20040519195926.3c5fa3bc@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> confessed:

>
> Did we ever come to a conclusion about excessive SMP context switching
> under load?
>

I just figured out what was causing the problem on my system Monday. I'm using
the pg_autovacuum daemon, and it was not vacuuming my db. I've no idea why and
didn't get a chance to investigate.

This lack of vacuuming was causing a huge number of context switches and query
delays. the queries that normally take .1 seconds were taking 11 seconds, and
the context switches were averaging 160k/s, peaking at 190k/s

Unfortunately, I was under pressure to fix the db at the time so I didn't get a
chance to play with the patch.

I restarted the vacuum daemon, and will keep an eye on it to see if it behaves.

If the problem re-occurs, is it worth while to attempt the different patch
delay settings?

Cheers,
Rob

--
19:45:40 up 21 days, 2:30, 4 users, load average: 2.03, 2.09, 2.06
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 02:41:26
Message-ID:	24373.1085020886@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Did we ever come to a conclusion about excessive SMP context switching
> under load?

Yeah: it's bad.

Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign
that causes less load on the BufMgrLock. However, the traditional
solution to too-much-contention-for-a-lock is to break up the locked
data structure into finer-grained units, which means *more* lock
operations in total. Normally you expect that the finer-grained lock
units will mean less contention. But given that the issue here seems to
be trading physical ownership of the lock's cache line back and forth,
I'm afraid that the traditional approach would actually make things
worse. The SMP issue seems to be not with whether there is
instantaneous contention for the locked datastructure, but with the cost
of making it possible for processor B to acquire a lock recently held by
processor A.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 02:42:26
Message-ID:	24405.1085020946@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Robert Creager <Robert_Creager(at)LogicalChaos(dot)org> writes:
> I just figured out what was causing the problem on my system Monday.
> I'm using the pg_autovacuum daemon, and it was not vacuuming my db.

Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

regards, tom lane

From:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 02:59:21
Message-ID:	20040519205921.46510067@thunder.mshome.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

When grilled further on (Wed, 19 May 2004 22:42:26 -0400),
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> confessed:

> Robert Creager <Robert_Creager(at)LogicalChaos(dot)org> writes:
> > I just figured out what was causing the problem on my system Monday.
> > I'm using the pg_autovacuum daemon, and it was not vacuuming my db.
>
> Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in
order then. I'm currently downloading 7.4.2 to see what the change is that I
need. Is it just the 7.4.2 pg_autovacuum that is needed here?

I've caught a whiff that 7.4.3 is nearing release? Any idea when?

Thanks,
Rob

--
20:45:52 up 21 days, 3:30, 4 users, load average: 2.02, 2.05, 2.05
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 03:02:16
Message-ID:	200405200302.i4K32G721749@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Did we ever come to a conclusion about excessive SMP context switching
> > under load?
>
> Yeah: it's bad.
>
> Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign
> that causes less load on the BufMgrLock. However, the traditional
> solution to too-much-contention-for-a-lock is to break up the locked
> data structure into finer-grained units, which means *more* lock
> operations in total. Normally you expect that the finer-grained lock
> units will mean less contention. But given that the issue here seems to
> be trading physical ownership of the lock's cache line back and forth,
> I'm afraid that the traditional approach would actually make things
> worse. The SMP issue seems to be not with whether there is
> instantaneous contention for the locked datastructure, but with the cost
> of making it possible for processor B to acquire a lock recently held by
> processor A.

I see. I don't even see a TODO in there. :-(

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 03:58:56
Message-ID:	25437.1085025536@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> ... The SMP issue seems to be not with whether there is
>> instantaneous contention for the locked datastructure, but with the cost
>> of making it possible for processor B to acquire a lock recently held by
>> processor A.

> I see. I don't even see a TODO in there. :-(

Nothing more specific than "investigate SMP context switching issues",
anyway. We are definitely in a research mode here, rather than an
engineering mode.

ObQuote: "Research is what I am doing when I don't know what I am
doing." - attributed to Werner von Braun, but has anyone got a
definitive reference?

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 04:02:41
Message-ID:	25480.1085025761@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Robert Creager <Robert_Creager(at)LogicalChaos(dot)org> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> confessed:
>> Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

> No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in
> order then. I'm currently downloading 7.4.2 to see what the change is that I
> need. Is it just the 7.4.2 pg_autovacuum that is needed here?

Nope, the fixes I was thinking about just missed the 7.4.2 release.
I think you can only get them from CVS. (Maybe we should offer a
nightly build of the latest stable release branch, not only development
tip...)

> I've caught a whiff that 7.4.3 is nearing release? Any idea when?

Not scheduled yet, but there was talk of pushing one out before 7.5 goes
into feature freeze.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 04:11:05
Message-ID:	200405200411.i4K4B5F03803@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

OK, added to TODO:

* Investigate SMP context switching issues

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> ... The SMP issue seems to be not with whether there is
> >> instantaneous contention for the locked datastructure, but with the cost
> >> of making it possible for processor B to acquire a lock recently held by
> >> processor A.
>
> > I see. I don't even see a TODO in there. :-(
>
> Nothing more specific than "investigate SMP context switching issues",
> anyway. We are definitely in a research mode here, rather than an
> engineering mode.
>
> ObQuote: "Research is what I am doing when I don't know what I am
> doing." - attributed to Werner von Braun, but has anyone got a
> definitive reference?
>
> regards, tom lane
>

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 04:11:57
Message-ID:	200405200411.i4K4BvV03958@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Robert Creager <Robert_Creager(at)LogicalChaos(dot)org> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> confessed:
> >> Do you have the post-7.4.2 datatype fixes for pg_autovacuum?
>
> > No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in
> > order then. I'm currently downloading 7.4.2 to see what the change is that I
> > need. Is it just the 7.4.2 pg_autovacuum that is needed here?
>
> Nope, the fixes I was thinking about just missed the 7.4.2 release.
> I think you can only get them from CVS. (Maybe we should offer a
> nightly build of the latest stable release branch, not only development
> tip...)
>
> > I've caught a whiff that 7.4.3 is nearing release? Any idea when?
>
> Not scheduled yet, but there was talk of pushing one out before 7.5 goes
> into feature freeze.

We need the temp table autovacuum fix before we do 7.4.3.

From:	Christopher Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 04:48:48
Message-ID:	m3zn83ema7.fsf@wolfe.cbbrowne.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

In an attempt to throw the authorities off his trail, tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) transmitted:
> ObQuote: "Research is what I am doing when I don't know what I am
> doing." - attributed to Werner von Braun, but has anyone got a
> definitive reference?

<http://www.quotationspage.com/search.php3?Author=Wernher+von+Braun&file=other>

That points to a bunch of seemingly authoritative sources...
--
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/lsf.html
"Terrrrrific." -- Ford Prefect

From:	"Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To:	Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pg(at)fastcrypt(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 05:10:14
Message-ID:	1085029814.32765.10.camel@zedora2
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, 2004-05-19 at 21:59, Robert Creager wrote:
> When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> confessed:
>
> >
> > Did we ever come to a conclusion about excessive SMP context switching
> > under load?
> >
>
> I just figured out what was causing the problem on my system Monday. I'm using
> the pg_autovacuum daemon, and it was not vacuuming my db. I've no idea why and
> didn't get a chance to investigate.

Strange. There is a known bug in the 7.4.2 version of pg_autovacuum
related to data type mismatches which is fixed in CVS. But that bug
doesn't cause pg_autovacuum to stop vacuuming but rather to vacuum to
often. So perhaps this is a different issue? Please let me know what
you find.

Thanks,

Matthew O'Connor

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 21:52:07
Message-ID:	200405201452.07833.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Guys,

> Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign
> that causes less load on the BufMgrLock.

FWIW, we've been pursuing two routes of quick patch fixes.

1) Dave Cramer and I have been testing setting varying rates of spin_delay in
an effort to find a "sweet spot" that the individual system seems to like.
This has been somewhat delayed by my illness.

2) The OSDL folks have been trying various patches to use Linux 2.6 Futexes in
place of semops (if I have that right) which, if successful, would produce a
linux-specific fix. However, they haven't yet come up wiith a version of
the patch which is stable.

I'm really curious, BTW, about how all of Jan's changes to buffer usage in 7.5
affect this issue. Has anyone tested it on a recent snapshot?

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pg(at)fastcrypt(dot)com, Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>, Dirk_Lutzebäck <lutzeb(at)aeccom(dot)com>, ohp(at)pyrenet(dot)fr, Joe Conway <mail(at)joeconway(dot)com>, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>
Subject:	Re: Wierd context-switching issue on Xeon
Date:	2004-05-20 22:14:52
Message-ID:	24678.1085091292@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I'm really curious, BTW, about how all of Jan's changes to buffer
> usage in 7.5 affect this issue. Has anyone tested it on a recent
> snapshot?

Won't help.

(1) Theoretical argument: the problem case is select-only and touches
few enough buffers that it need never visit the kernel. The buffer
management algorithm is thus irrelevant since there are never any
decisions for it to make. If anything CVS tip will have a worse problem
because its more complicated management algorithm needs to spend longer
holding the BufMgrLock.

(2) Experimental argument: I believe that I did check the self-contained
test case we eventually developed against CVS tip on one of Red Hat's
SMP machines, and indeed it was unhappy.

regards, tom lane