Re: Strange behavior: pgbench and new Linux kernels

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange behavior: pgbench and new Linux kernels
Date: 2009-04-04 16:07:58
Message-ID: alpine.GSO.2.01.0904041147540.27286@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 31 Mar 2009, Kevin Grittner wrote:

>>>> On Thu, Apr 17, 2008 at 7:26 PM, Greg Smith wrote:
>
>> On this benchmark 2.6.25 is the worst kernel yet:
>
> I don't remember seeing a follow-up on this issue from last year.
> Are there still any particular kernels to avoid based on this?

I just discovered something really fascinating here. The problem is
strictly limited to when you're connecting via Unix-domain sockets; use
TCP/IP instead, and it goes away.

To refresh everyone's memory here, I reported a problem to the LKML here:
http://lkml.org/lkml/2008/5/21/292 Got some patches and some kernel tweaks
for the scheduler but never a clear resolution for the cause, which kept
anybody from getting too excited about merging anything. Test results
comparing various tweaks on the hardware I'm still using now are at
http://lkml.org/lkml/2008/5/26/288

For example, here's kernel 2.6.25 running pgbench with 50 clients with a
Q6000 processor, demonstrating poor performance--I'd get >20K TPS here
with a pre-CFS kernel:

$ pgbench -S -t 4000 -c 50 -n pgbench
transaction type: SELECT only
scaling factor: 10
query mode: simple
number of clients: 50
number of transactions per client: 4000
number of transactions actually processed: 200000/200000
tps = 8288.047442 (including connections establishing)
tps = 8319.702195 (excluding connections establishing)

If I now execute exactly the same test, but using localhost, performance
returns to normal:

$ pgbench -S -t 4000 -c 50 -n -h localhost pgbench
transaction type: SELECT only
scaling factor: 10
query mode: simple
number of clients: 50
number of transactions per client: 4000
number of transactions actually processed: 200000/200000
tps = 17575.277771 (including connections establishing)
tps = 17724.651090 (excluding connections establishing)

That's 100% repeatable, I ran each test several times each way.

So the new summary here of what I've found is that if:

1) You're running Linux 2.6.23 or greater (confirmed in up to 2.6.26)
2) You connect over a Unix-domain socket
3) Your client count is relatively high (>8 clients/core)

You can expect your pgbench results to tank. Switch to connecting over
TCP/IP to localhost, and everything is fine; it's not quite as fast as the
pre-CFS kernels in some cases, in others it's faster though.

I haven't gotten to testing kernels newer than 2.6.26 yet, when I saw a
17K TPS result during one of my tests on 2.6.25 I screeched to a halt to
isolate this instead.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2009-04-04 18:55:04 Re: Strange behavior: pgbench and new Linux kernels
Previous Message henk de wit 2009-04-04 15:54:43 Re: Using IOZone to simulate DB access patterns