Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

Lists: pgsql-performance
From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Two identical systems, radically different performance
Date: 2012-10-08 21:45:18
Message-ID: CAFwQ8rcvw4mSoJ9q+ouONCeB9dg1T9gAUV5J27MpVZrehRHayQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

This is driving me crazy. A new server, virtually identical to an old one,
has 50% of the performance with pgbench. I've checked everything I can
think of.

The setups (call the servers "old" and "new"):

old: 2 x 4-core Intel Xeon E5620
new: 4 x 4-core Intel Xeon E5606

both:

memory: 12 GB DDR EC
Disks: 12x500GB disks (Western Digital 7200RPM SATA)
2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
8 disks, RAID10: $PGDATA

3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
indicates that the battery is charged and the cache is working on both
units.

Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
actually cloned from old server).

Postgres: 8.4.4 (yes, I should update. But both are identical.)

The postgres.conf files are identical; diffs from the original are:

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
track_activities = on
track_counts = on
track_functions = none
autovacuum = on
autovacuum_naptime = 5min
escape_string_warning = off

Note that the old server is in production and was serving a light load
while this test was running, so in theory it should be slower, not faster,
than the new server.

pgbench: Old server

pgbench -i -s 100 -U test
pgbench -U test -c ... -t ...

-c -t TPS
5 20000 3777
10 10000 2622
20 5000 3759
30 3333 5712
40 2500 5953
50 2000 6141

New server
-c -t TPS
5 20000 2733
10 10000 2783
20 5000 3241
30 3333 2987
40 2500 2739
50 2000 2119

As you can see, the new server is dramatically slower than the old one.

I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
The xlog disks were almost identical in performance. The RAID10 pg-data
disks looked like this:

Old server:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
%CP
xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31
737.6 31
Latency 20512us 469ms 394ms 21402us 396ms
112ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
xenon -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP
16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++
+++
Latency 43291us 857us 519us 1588us 37us
178us
1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
+,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us

New server:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
%CP
zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17
752.0 23
Latency 15613us 598ms 597ms 2764us 398ms
215ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
zinc -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP
16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++
+++
Latency 487us 627us 407us 972us 29us
262us
1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us

I don't know enough about bonnie++ to know if these differences are
interesting.

One dramatic difference I noted via vmstat. On the old server, the I/O
load during the bonnie++ run was steady, like this:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3
86 10
0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2
86 11
0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4
86 10
0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2
87 11
0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3
86 10
1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4
86 10
0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3
86 10
1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2
87 11
0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2
86 12

But the new server varied wildly during bonnie++:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0 2
93 5
0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0 1
94 5
0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0 1
91 7
0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0 1
93 6
0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0 1
90 9
0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0 1
96 4
0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0 1
94 5
0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1 2
91 6
0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0 2
92 6
0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0 2
93 5
0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0 1
94 4
1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0 3
90 7
1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0 1
91 8
0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0 1
96 3

Any ideas where to look next would be greatly appreciated.

Craig


From: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 21:57:24
Message-ID: 4CB0E3B6-CF1E-4B54-9900-E8356744F0AA@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:

> This is driving me crazy. A new server, virtually identical to an old one, has 50% of the performance with pgbench. I've checked everything I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB

wal buffers seems very small. Simon suggests to set them at least to 16MB.
> checkpoint_segments = 30
> effective_cache_size = 4GB

You have 12Gb RAM.
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load while this test was running, so in theory it should be slower, not faster, than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119
>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++. The xlog disks were almost identical in performance. The RAID10 pg-data disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms 112ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 43291us 857us 519us 1588us 37us 178us
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms 215ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 487us 627us 407us 972us 29us 262us
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us

Sequential Input on the new one is 279MB/s, on the old 400MB/s.

> I don't know enough about bonnie++ to know if these differences are interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0 1 96 3
>
> Any ideas where to look next would be greatly appreciated.
>
> Craig
>


From: Craig James <cjames(at)emolecules(dot)com>
To: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:06:05
Message-ID: CAFwQ8rd1TTLh-BfMf-Uc2Vv_8GuP2uQqWAa6-FrtmJUhy3LVxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 2:57 PM, Evgeny Shishkin <itparanoia(at)gmail(dot)com>wrote:

>
> On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
> The xlog disks were almost identical in performance. The RAID10 pg-data
> disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31
> 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms
> 112ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 43291us 857us 519us 1588us 37us
> 178us
>
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17
> 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms
> 215ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 487us 627us 407us 972us 29us
> 262us
>
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
>
> Sequential Input on the new one is 279MB/s, on the old 400MB/s.
>
>
But why? What have I overlooked?

Thanks,
Craig


From: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:08:29
Message-ID: 099BB618-4F80-4200-9564-4F06231D8D57@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Oct 9, 2012, at 2:06 AM, Craig James <cjames(at)emolecules(dot)com> wrote:

>
>
> On Mon, Oct 8, 2012 at 2:57 PM, Evgeny Shishkin <itparanoia(at)gmail(dot)com> wrote:
>
> On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:
>
>> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++. The xlog disks were almost identical in performance. The RAID10 pg-data disks looked like this:
>>
>> Old server:
>> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31 737.6 31
>> Latency 20512us 469ms 394ms 21402us 396ms 112ms
>> Version 1.96 ------Sequential Create------ --------Random Create--------
>> xenon -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
>> Latency 43291us 857us 519us 1588us 37us 178us
>> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>>
>>
>> New server:
>> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17 752.0 23
>> Latency 15613us 598ms 597ms 2764us 398ms 215ms
>> Version 1.96 ------Sequential Create------ --------Random Create--------
>> zinc -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
>> Latency 487us 627us 407us 972us 29us 262us
>> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> Sequential Input on the new one is 279MB/s, on the old 400MB/s.
>
>
> But why? What have I overlooked?

blockdev --setra 32000 ?
Also you benchmarked volume for pgdata? Can you provide benchmarks for wal volume?
>
> Thanks,
> Craig
>


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:09:45
Message-ID: CAGTBQpZnkU7n4iAhiNzN3rsMW2WT+H6YpN7c7bkNbfj+3hr6NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 7:06 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>> Sequential Input on the new one is 279MB/s, on the old 400MB/s.
>>
>
> But why? What have I overlooked?

Do you have readahead properly set up on the new one?


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:16:05
Message-ID: 507350A5.2080503@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/08/2012 02:45 PM, Craig James wrote:
> This is driving me crazy. A new server, virtually identical to an old
> one, has 50% of the performance with pgbench. I've checked everything
> I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
Exact same model of disk, same on-board cache, same RAID-card RAM size,
same RAID strip-size, etc.??

Cheers,
Steve


From: Craig James <cjames(at)emolecules(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:25:43
Message-ID: CAFwQ8rfVgRXQLaHULbvpD65+7jYsAt9ANqujOwnksFLGYHkZew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 3:09 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Mon, Oct 8, 2012 at 7:06 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
> >> Sequential Input on the new one is 279MB/s, on the old 400MB/s.
> >>
> >
> > But why? What have I overlooked?
>
> Do you have readahead properly set up on the new one?
>

# blockdev --getra /dev/sdb1
256

Same on both servers.

Thanks,
Craig


From: Imre Samu <pella(dot)samu(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:28:37
Message-ID: CAJnEWwnL1MyP4MicE-QV71ZQuQ_d48rTVc6qptghFhLUcWXDRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>old: 2 x 4-core Intel Xeon E5620
>new: 4 x 4-core Intel Xeon E5606

http://ark.intel.com/compare/47925,52583

old: Xeon E5620 : 4 cores ; 8 Threads ; *Clock Speed : 2.40 GHz
; Max Turbo Frequency: 2.66 GHz*
new: Xeon E5606 : 4 cores ; 4 Threads ; Clock Speed : 2.13 GHz ;
Max Turbo Frequency: -

the older processor maybe faster ;

Imre

2012/10/8 Craig James <cjames(at)emolecules(dot)com>

> This is driving me crazy. A new server, virtually identical to an old
> one, has 50% of the performance with pgbench. I've checked everything I
> can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both
> units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load
> while this test was running, so in theory it should be slower, not faster,
> than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119
>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
> The xlog disks were almost identical in performance. The RAID10 pg-data
> disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31
> 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms
> 112ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 43291us 857us 519us 1588us 37us
> 178us
>
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17
> 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms
> 215ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 487us 627us 407us 972us 29us
> 262us
>
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are
> interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O
> load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3
> 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2
> 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4
> 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2
> 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3
> 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4
> 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3
> 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2
> 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2
> 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0
> 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0
> 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0
> 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0
> 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0
> 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0
> 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0
> 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1
> 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0
> 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0
> 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0
> 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0
> 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0
> 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0
> 1 96 3
>
> Any ideas where to look next would be greatly appreciated.
>
> Craig
>
>


From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:29:17
Message-ID: CAFwQ8rfm2AcuoFHDfBLA_hg7ffbsYNsrsaJPDqcOeGqaVW02OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

One mistake in my descriptions...

On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cjames(at)emolecules(dot)com> wrote:

> This is driving me crazy. A new server, virtually identical to an old
> one, has 50% of the performance with pgbench. I've checked everything I
> can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>

Actually it's not 16 cores. It's 8 cores, hyperthreaded. Hyperthreading
is disabled on the old system.

Is that enough to make this radical difference? (The server is at a
co-location site, so I have to go down there to boot into the BIOS and
disable hyperthreading.)

Craig

>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both
> units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load
> while this test was running, so in theory it should be slower, not faster,
> than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119
>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
> The xlog disks were almost identical in performance. The RAID10 pg-data
> disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31
> 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms
> 112ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 43291us 857us 519us 1588us 37us
> 178us
>
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17
> 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms
> 215ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 487us 627us 407us 972us 29us
> 262us
>
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are
> interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O
> load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3
> 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2
> 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4
> 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2
> 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3
> 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4
> 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3
> 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2
> 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2
> 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0
> 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0
> 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0
> 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0
> 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0
> 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0
> 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0
> 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1
> 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0
> 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0
> 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0
> 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0
> 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0
> 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0
> 1 96 3
>
> Any ideas where to look next would be greatly appreciated.
>
> Craig
>
>


From: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:33:56
Message-ID: B5EA62D3-A446-4A1F-BC4E-4B8290FEE1C2@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:

> This is driving me crazy. A new server, virtually identical to an old one, has 50% of the performance with pgbench. I've checked everything I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load while this test was running, so in theory it should be slower, not faster, than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119

On new server postgresql do not scale at all. Looks like contention.

>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++. The xlog disks were almost identical in performance. The RAID10 pg-data disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms 112ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 43291us 857us 519us 1588us 37us 178us
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms 215ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 487us 627us 407us 972us 29us 262us
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0 1 96 3
>

Also note the difference in free/cache distribution. Unless you took these numbers in completely different stages of bonnie++.

> Any ideas where to look next would be greatly appreciated.
>
> Craig
>


From: Craig James <cjames(at)emolecules(dot)com>
To: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:42:33
Message-ID: CAFwQ8rcXE3=2KLgYKO_x+7cMpttwabV0EwEeDSoVZWP7Kx6RGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 3:33 PM, Evgeny Shishkin <itparanoia(at)gmail(dot)com>wrote:

>
> On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:
>
> One dramatic difference I noted via vmstat. On the old server, the I/O
> load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3
> 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2
> 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4
> 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2
> 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3
> 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4
> 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3
> 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2
> 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2
> 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0
> 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0
> 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0
> 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0
> 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0
> 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0
> 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0
> 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1
> 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0
> 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0
> 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0
> 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0
> 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0
> 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0
> 1 96 3
>
>
> Also note the difference in free/cache distribution. Unless you took these
> numbers in completely different stages of bonnie++.
>
>
The old server is in production and is running Apache/Postgres requests.

Craig


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:44:04
Message-ID: CAGTBQpbFSWPR1Z9NiN_B6amhONjyt9PvoT13Z=SdPN8GoKZi2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 7:25 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>> > But why? What have I overlooked?
>>
>> Do you have readahead properly set up on the new one?
>
>
> # blockdev --getra /dev/sdb1
> 256

It's probably this. 256 is way too low to saturate your I/O system.
Pump it up. I've found 8192 works nice for a system I have, 32000 I
guess could work too.


From: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:46:38
Message-ID: A62086F6-67AE-441F-85A5-9C26EF59EBC4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Oct 9, 2012, at 2:44 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:

> On Mon, Oct 8, 2012 at 7:25 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>>>> But why? What have I overlooked?
>>>
>>> Do you have readahead properly set up on the new one?
>>
>>
>> # blockdev --getra /dev/sdb1
>> 256
>
>
> It's probably this. 256 is way too low to saturate your I/O system.
> Pump it up. I've found 8192 works nice for a system I have, 32000 I
> guess could work too.

This, i also suggest to rebenchmark with increased wal_buffers. May be that downscale comes from wal mutex contention.


From: Craig James <cjames(at)emolecules(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:48:52
Message-ID: CAFwQ8rdmzO046cXO-mqcsZKsj9Ch_W-mfiJpeQnOx8xob-SMVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 3:44 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Mon, Oct 8, 2012 at 7:25 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
> >> > But why? What have I overlooked?
> >>
> >> Do you have readahead properly set up on the new one?
> >
> >
> > # blockdev --getra /dev/sdb1
> > 256
>
>
> It's probably this. 256 is way too low to saturate your I/O system.
> Pump it up. I've found 8192 works nice for a system I have, 32000 I
> guess could work too.
>

But again ... the two systems are identical. This can't explain it.

Thanks,
Craig


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:50:30
Message-ID: CAGTBQpaoMLASfgTn-fufwou5-Lu6AybsGfi9a7JHeZZaeL+u6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 7:48 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>> > # blockdev --getra /dev/sdb1
>> > 256
>>
>>
>> It's probably this. 256 is way too low to saturate your I/O system.
>> Pump it up. I've found 8192 works nice for a system I have, 32000 I
>> guess could work too.
>
>
> But again ... the two systems are identical. This can't explain it.

Is the read-ahead the same in both systems?


From: Craig James <cjames(at)emolecules(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:03:53
Message-ID: CAFwQ8reOc-ntKsLRSqVOf81FsGRA_OiVbusDduB2BuH_SzdNGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 3:50 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Mon, Oct 8, 2012 at 7:48 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
> >> > # blockdev --getra /dev/sdb1
> >> > 256
> >>
> >>
> >> It's probably this. 256 is way too low to saturate your I/O system.
> >> Pump it up. I've found 8192 works nice for a system I have, 32000 I
> >> guess could work too.
> >
> >
> > But again ... the two systems are identical. This can't explain it.
>
> Is the read-ahead the same in both systems?
>

Yes, as I said in the original reply (it got cut off from your reply):
"Same on both servers."

Craig


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:10:54
Message-ID: 50735D7E.7000405@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 09/10/12 11:48, Craig James wrote:
> On Mon, Oct 8, 2012 at 3:44 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:
>
>> On Mon, Oct 8, 2012 at 7:25 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>>>>> But why? What have I overlooked?
>>>> Do you have readahead properly set up on the new one?
>>>
>>> # blockdev --getra /dev/sdb1
>>> 256
>>
>> It's probably this. 256 is way too low to saturate your I/O system.
>> Pump it up. I've found 8192 works nice for a system I have, 32000 I
>> guess could work too.
>>
> But again ... the two systems are identical. This can't explain it.
>

Maybe check all sysctl's are the same - in particular:

vm.zone_reclaim_mode

has a tendency to set itself to 1 on newer hardware, which will reduce
performance of database style workloads.

Cheers

Mark


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Evgeny Shishkin <itparanoia(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:12:54
Message-ID: CAGTBQpYZv-fSWx1xaDxkJrE1Vo-SvNKPCjH2JXNdhY0vLH3iRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 8:03 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>> > But again ... the two systems are identical. This can't explain it.
>>
>> Is the read-ahead the same in both systems?
>
>
> Yes, as I said in the original reply (it got cut off from your reply): "Same
> on both servers."

Oh, yes. Google collapsed it. Wierd.

Anyway, sequential I/O isn't the same in both servers, and usually you
don't get full sequential performance unless you bump up the
read-ahead. I'm still betting on that for the difference in sequential
performance.

As for pgbench, I'm not sure, but I think pgbench doesn't really
stress sequential performance. You seem to be getting bad queueing
performance. Did you check NCQ status on the RAID controller? Is it on
on both servers?


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:16:15
Message-ID: 50735EBF.5010909@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 9.10.2012 01:03, Craig James wrote:
>
>
> On Mon, Oct 8, 2012 at 3:50 PM, Claudio Freire <klaussfreire(at)gmail(dot)com
> <mailto:klaussfreire(at)gmail(dot)com>> wrote:
>
> On Mon, Oct 8, 2012 at 7:48 PM, Craig James <cjames(at)emolecules(dot)com
> <mailto:cjames(at)emolecules(dot)com>> wrote:
> >> > # blockdev --getra /dev/sdb1
> >> > 256
> >>
> >>
> >> It's probably this. 256 is way too low to saturate your I/O system.
> >> Pump it up. I've found 8192 works nice for a system I have, 32000 I
> >> guess could work too.
> >
> >
> > But again ... the two systems are identical. This can't explain it.
>
> Is the read-ahead the same in both systems?
>
>
> Yes, as I said in the original reply (it got cut off from your reply):
> "Same on both servers."

And what about read-ahead settings on the controller? 3WARE used to have
a read-ahead settings on their own (usually there are three options -
read-ahead, no read-ahead and adaptive). Is this set to the same value
on both machines?

Tomas


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:24:02
Message-ID: 50736092.8080906@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 9.10.2012 00:33, Evgeny Shishkin wrote:
>>
>> pgbench: Old server
>>
>> pgbench -i -s 100 -U test
>> pgbench -U test -c ... -t ...
>>
>> -c -t TPS
>> 5 20000 3777
>> 10 10000 2622
>> 20 5000 3759
>> 30 3333 5712
>> 40 2500 5953
>> 50 2000 6141
>>
>> New server
>> -c -t TPS
>> 5 20000 2733
>> 10 10000 2783
>> 20 5000 3241
>> 30 3333 2987
>> 40 2500 2739
>> 50 2000 2119
>
> On new server postgresql do not scale at all. Looks like contention.

Why? The evidence we've seen so far IMHO suggests a poorly performing
I/O subsystem. Post a few lines of "vmstat 1" / "iostat -x -k 1"
collected when the pgbench is running, that might tell us more.

Try a few very basic I/O tests that are easy to understand rather than
running bonnie++ which is quite complex. For example try this:

time sh -c "dd if=/dev/zero of=myfile.tmp bs=8192 count=4194304 && sync"

dd if=myfile.tmp of=/dev/null bs=8192

The former measures sequential write speed, the latter measures
sequential read speed in a very primitive way. Watch vmstat/iostat and
don't bother running pgbench until you get a reasonable performance on
both systems.

Tomas


From: Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:30:31
Message-ID: 1294E917-64C5-4481-8C7F-F00370A96FA9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Oct 9, 2012, at 3:24 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 9.10.2012 00:33, Evgeny Shishkin wrote:
>>>
>>> pgbench: Old server
>>>
>>> pgbench -i -s 100 -U test
>>> pgbench -U test -c ... -t ...
>>>
>>> -c -t TPS
>>> 5 20000 3777
>>> 10 10000 2622
>>> 20 5000 3759
>>> 30 3333 5712
>>> 40 2500 5953
>>> 50 2000 6141
>>>
>>> New server
>>> -c -t TPS
>>> 5 20000 2733
>>> 10 10000 2783
>>> 20 5000 3241
>>> 30 3333 2987
>>> 40 2500 2739
>>> 50 2000 2119
>>
>> On new server postgresql do not scale at all. Looks like contention.
>
> Why? The evidence we've seen so far IMHO suggests a poorly performing
> I/O subsystem. Post a few lines of "vmstat 1" / "iostat -x -k 1"
> collected when the pgbench is running, that might tell us more.
>

Because 50 clients can push io even with small read ahead. And hear we see nice parabola. Just guessing anyway.

> Try a few very basic I/O tests that are easy to understand rather than
> running bonnie++ which is quite complex. For example try this:
>
> time sh -c "dd if=/dev/zero of=myfile.tmp bs=8192 count=4194304 && sync"
>
> dd if=myfile.tmp of=/dev/null bs=8192
>
> The former measures sequential write speed, the latter measures
> sequential read speed in a very primitive way. Watch vmstat/iostat and
> don't bother running pgbench until you get a reasonable performance on
> both systems.
>
>
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:40:31
Message-ID: CAFwQ8rcyCKcoeEn62_QHMRSmJM03ULERtW6_R=3TaO0twOEV9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Nobody has commented on the hyperthreading question yet ... does it really
matter? The old (fast) server has hyperthreading disabled, and the new
(slower) server has hyperthreads enabled.

If hyperthreading is definitely NOT an issue, it will save me a trip to the
co-lo facility.

Thanks,
Craig

On Mon, Oct 8, 2012 at 3:29 PM, Craig James <cjames(at)emolecules(dot)com> wrote:

> One mistake in my descriptions...
>
> On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>
>> This is driving me crazy. A new server, virtually identical to an old
>> one, has 50% of the performance with pgbench. I've checked everything I
>> can think of.
>>
>> The setups (call the servers "old" and "new"):
>>
>> old: 2 x 4-core Intel Xeon E5620
>> new: 4 x 4-core Intel Xeon E5606
>>
>
> Actually it's not 16 cores. It's 8 cores, hyperthreaded. Hyperthreading
> is disabled on the old system.
>
> Is that enough to make this radical difference? (The server is at a
> co-location site, so I have to go down there to boot into the BIOS and
> disable hyperthreading.)
>
> Craig
>


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 23:52:28
Message-ID: 5073673C.10905@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 09/10/12 12:40, Craig James wrote:
> Nobody has commented on the hyperthreading question yet ... does it
> really matter? The old (fast) server has hyperthreading disabled, and
> the new (slower) server has hyperthreads enabled.
>
> If hyperthreading is definitely NOT an issue, it will save me a trip
> to the co-lo facility.
>
> Thanks,
> Craig
>
> On Mon, Oct 8, 2012 at 3:29 PM, Craig James <cjames(at)emolecules(dot)com
> <mailto:cjames(at)emolecules(dot)com>> wrote:
>
> One mistake in my descriptions...
>
> On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cjames(at)emolecules(dot)com
> <mailto:cjames(at)emolecules(dot)com>> wrote:
>
> This is driving me crazy. A new server, virtually identical
> to an old one, has 50% of the performance with pgbench. I've
> checked everything I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
>
> Actually it's not 16 cores. It's 8 cores, hyperthreaded.
> Hyperthreading is disabled on the old system.
>
> Is that enough to make this radical difference? (The server is at
> a co-location site, so I have to go down there to boot into the
> BIOS and disable hyperthreading.)
>
> Craig
>
>
My latest development box (Intel Latest Core i7 3770K Ivy Bridge Quad
Core with HT 3.4GHz) has hyperthreading - and it *_does_* make a
significant difference.

Cheers,
Gavin


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 00:00:20
Message-ID: CA+CSw_tEj0+2sFXHhCaFb7UurjgXOkCtE02D79Zk9Gt-GLG72A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 9, 2012 at 2:40 AM, Craig James <cjames(at)emolecules(dot)com> wrote:
> Nobody has commented on the hyperthreading question yet ... does it really
> matter? The old (fast) server has hyperthreading disabled, and the new
> (slower) server has hyperthreads enabled.
>
> If hyperthreading is definitely NOT an issue, it will save me a trip to the
> co-lo facility.

Hyperthreading will make lock contention issues worse by having more
threads fighting. Test the new box with postgres 9.2, if the newer
version exhibits much better scaling behavior it strongly suggest lock
contention rather than IO being the root cause.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 11:20:14
Message-ID: 5074086E.8050504@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 2012-10-08 23:45, Craig James wrote:
> This is driving me crazy. A new server, virtually identical to an old
> one, has 50% of the performance with pgbench. I've checked everything
> I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606

How are the filesystems formatted and mounted (-o nobarrier?)

regards
Yeb


From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 16:02:29
Message-ID: 50744A95.9020403@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/08/2012 06:40 PM, Craig James wrote:

> Nobody has commented on the hyperthreading question yet ... does it
> really matter? The old (fast) server has hyperthreading disabled, and
> the new (slower) server has hyperthreads enabled.

I doubt it's this. With the newer post-Nehalem processors,
hyperthreading is actually much better than it was before. But you also
have this:

CPU Speed L3 Cache DDR3 Speed
E5606 2.13Ghz 8MB 800Mhz
E5620 2.4Ghz 12MB 1066Mhz

Even with "equal" threads, the CPUs you have in the new server, as
opposed to the old, are much worse. The E5606 doesn't even have
hyper-threading, so it's not an issue here. In fact, if you enabled it
on the old server, it would likely get *much faster*.

We saw a 40% improvement by enabling hyper-threading. Sure, it's not
100%, but it's not negative or zero, either.

Basically we can see, at the very least, that your servers are not
"identical." Little things like this can make a massive difference. The
old server has a much better CPU. Even crippled without hyperthreading,
I could see it beating the new server.

One thing you might want to check in the BIOS of the new server, is to
make sure that power saving mode is disabled everywhere you can find it.
Some servers come with that set by default, and that puts the CPU to
sleep occasionally, and the spin-up necessary to re-engage it is
punishing and inconsistent. We saw 20-40% drops in pgbench pretty much
at random, when CPU power saving was enabled.

This doesn't cover why your IO subsystem is slower on the new system,
but I suspect it might have something to do with the memory speed. It
suggests a slower PCI bus, which could choke your RAID card.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


From: David Thomas <david(at)digitaldogma(dot)org>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 16:14:48
Message-ID: 20121009161448.GA27123@digitaldogma.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 08, 2012 at 04:40:31PM -0700, Craig James wrote:
> Nobody has commented on the hyperthreading question yet ... does it
> really matter? The old (fast) server has hyperthreading disabled, and
> the new (slower) server has hyperthreads enabled.
> If hyperthreading is definitely NOT an issue, it will save me a trip to
> the co-lo facility.

From my reading it seems that hyperthreading hasn't been a major issue
for quite sometime on modern kernels.
http://archives.postgresql.org/pgsql-performance/2004-10/msg00052.php

I doubt it would hurt much, but I wouldn't make a special trip to the
co-lo to change it.
--
DavidT


From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 16:41:27
Message-ID: CAFwQ8rdc1zXYKZa2M7ytHZz1GAmWcVQhho+Xu2SOPJP_uid50Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 9, 2012 at 9:02 AM, Shaun Thomas <sthomas(at)optionshouse(dot)com>wrote:

> On 10/08/2012 06:40 PM, Craig James wrote:
>
> Nobody has commented on the hyperthreading question yet ... does it
>> really matter? The old (fast) server has hyperthreading disabled, and
>> the new (slower) server has hyperthreads enabled.
>>
>
> I doubt it's this. With the newer post-Nehalem processors, hyperthreading
> is actually much better than it was before. But you also have this:
>
> CPU Speed L3 Cache DDR3 Speed
> E5606 2.13Ghz 8MB 800Mhz
> E5620 2.4Ghz 12MB 1066Mhz
>
> Even with "equal" threads, the CPUs you have in the new server, as
> opposed to the old, are much worse. The E5606 doesn't even have
> hyper-threading, so it's not an issue here. In fact, if you enabled it on
> the old server, it would likely get *much faster*.
>

Even more mysterious, because it turns out it's backwards. I
copy-and-pasted the CPU information wrong. I wrote:

> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606

The correct configuration is:

old: 2x4-core Intel Xeon E2606 2.133 GHz
new: 2x4-core Intex Xeon E5620 2.40 GHz

So that makes the poor performance of the new system even more mystifying.

I'm going down there right now to disable hyperthreading and see if that's
the answer. So far, that's the only concrete thing that I've been able to
discover that's different between the two systems.

>
> We saw a 40% improvement by enabling hyper-threading. Sure, it's not 100%,
> but it's not negative or zero, either.
>
> Basically we can see, at the very least, that your servers are not
> "identical." Little things like this can make a massive difference. The old
> server has a much better CPU. Even crippled without hyperthreading, I could
> see it beating the new server.
>
> One thing you might want to check in the BIOS of the new server, is to
> make sure that power saving mode is disabled everywhere you can find it.
> Some servers come with that set by default, and that puts the CPU to sleep
> occasionally, and the spin-up necessary to re-engage it is punishing and
> inconsistent. We saw 20-40% drops in pgbench pretty much at random, when
> CPU power saving was enabled.
>

Thanks, I'll double check that too. That's a good suspect.

>
> This doesn't cover why your IO subsystem is slower on the new system, but
> I suspect it might have something to do with the memory speed. It suggests
> a slower PCI bus, which could choke your RAID card.
>

The motherboards are supposed to be identical. But I'll double check that
too.

Craig

>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas(at)optionshouse(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions
> related to this email


From: Craig James <cjames(at)emolecules(dot)com>
To: david(at)digitaldogma(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-09 16:43:18
Message-ID: CAFwQ8rfWEZDU3Pn4ZC24enA3phop-QGwFRNidnLyMVucP08rHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 9, 2012 at 9:14 AM, David Thomas <david(at)digitaldogma(dot)org> wrote:

> On Mon, Oct 08, 2012 at 04:40:31PM -0700, Craig James wrote:
> > Nobody has commented on the hyperthreading question yet ... does it
> > really matter? The old (fast) server has hyperthreading disabled, and
> > the new (slower) server has hyperthreads enabled.
> > If hyperthreading is definitely NOT an issue, it will save me a trip
> to
> > the co-lo facility.
>
> From my reading it seems that hyperthreading hasn't been a major issue
> for quite sometime on modern kernels.
> http://archives.postgresql.org/pgsql-performance/2004-10/msg00052.php
>
> I doubt it would hurt much, but I wouldn't make a special trip to the
> co-lo to change it.
>

At this point I've discovered no other options, so down to the co-lo I go.
I'm also going to check power-save options and the RAID controller's
built-in configuration to see if I overlooked something there (readahead,
blocksize, whatever).

Craig

> --
> DavidT
>


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-11 14:14:11
Message-ID: 5076D433.7090605@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/09/2012 01:40 AM, Craig James wrote:
> Nobody has commented on the hyperthreading question yet ... does it really matter? The old (fast) server has hyperthreading disabled, and the new (slower) server has hyperthreads enabled.
>
> If hyperthreading is definitely NOT an issue, it will save me a trip to the co-lo facility.

sorry to come late to the party, but being in a similar condition
I've googled a bit and I've found a way to disable hyperthreading without
the need to reboot the system and entering the bios:

echo 0 >/sys/devices/system/node/node0/cpuX/online

where X belongs to 1..(#cores * 2) if hyperthreading is enabled
(cpu0 can't be switched off).

didn't try myself on live system, but I definitely will
as soon as I have a new machine to test.

Andrea


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-11 14:19:33
Message-ID: CAGTBQpYGyq70q_S_dUngSwB7yLbbCxetnfXpaMC9LEr7zL6iWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Oct 11, 2012 at 11:14 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> sorry to come late to the party, but being in a similar condition
> I've googled a bit and I've found a way to disable hyperthreading without
> the need to reboot the system and entering the bios:
>
> echo 0 >/sys/devices/system/node/node0/cpuX/online
>
> where X belongs to 1..(#cores * 2) if hyperthreading is enabled
> (cpu0 can't be switched off).
>
> didn't try myself on live system, but I definitely will
> as soon as I have a new machine to test.

Question is... will that remove the performance penalty of HyperThreading?

I don't think so, because a big one is the register file split (half
the hardware registers go to a CPU, half to the other). If that action
doesn't tell the CPU to "unsplit", some shared components may become
unbogged, like the decode stage probably, but I'm not sure it's the
same as disabling it from the BIOS.


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-11 14:40:14
Message-ID: 5076DA4E.80504@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/11/2012 04:19 PM, Claudio Freire wrote:
> On Thu, Oct 11, 2012 at 11:14 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> sorry to come late to the party, but being in a similar condition
>> I've googled a bit and I've found a way to disable hyperthreading without
>> the need to reboot the system and entering the bios:
>>
>> echo 0 >/sys/devices/system/node/node0/cpuX/online
>>
>> where X belongs to 1..(#cores * 2) if hyperthreading is enabled
>> (cpu0 can't be switched off).
>>
>> didn't try myself on live system, but I definitely will
>> as soon as I have a new machine to test.
>
> Question is... will that remove the performance penalty of HyperThreading?

So I've added to my todo list to perform a test to verify this claim :)

> I don't think so, because a big one is the register file split (half
> the hardware registers go to a CPU, half to the other). If that action
> doesn't tell the CPU to "unsplit", some shared components may become
> unbogged, like the decode stage probably, but I'm not sure it's the
> same as disabling it from the BIOS.

Although I think that you're probably right to assume that disabling HT
through the syfs interface won't remove the performance penalty for real.

thanks

Andrea


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 08:27:10
Message-ID: 507BC8DE.9050109@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/11/2012 04:40 PM, Andrea Suisani wrote:
> On 10/11/2012 04:19 PM, Claudio Freire wrote:
>> On Thu, Oct 11, 2012 at 11:14 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>>> sorry to come late to the party, but being in a similar condition
>>> I've googled a bit and I've found a way to disable hyperthreading without
>>> the need to reboot the system and entering the bios:
>>>
>>> echo 0 >/sys/devices/system/node/node0/cpuX/online
>>>
>>> where X belongs to 1..(#cores * 2) if hyperthreading is enabled
>>> (cpu0 can't be switched off).
>>>
>>> didn't try myself on live system, but I definitely will
>>> as soon as I have a new machine to test.
>>
>> Question is... will that remove the performance penalty of HyperThreading?
>
> So I've added to my todo list to perform a test to verify this claim :)

done.

in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
(sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
(sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
of 512 MB).

Postgres ver 9.2.1 (sorry for not having benchmarked 9.1,
but this what we plan to deploy in production). Both the OS
(Ubuntu 12.04.1) and Postgres had been briefly tuned according
to the usal standards while trying to mimic Craig's configuration
(see specific settings at the bottom).

TPS including connection establishing, pgbench run in a single
thread mode, connection made through unix socket, OS cache dropped
and Postgres restarted for every run.

those are the results:

HT HT SYSFS DIS HT BIOS DISABLE
-c -t r1 r2 r3 r1 r2 r3 r1 r2 r3
5 20K 1641 1831 1496 2020 1974 2033 2005 1988 1967
10 10K 2161 2134 2136 2277 2252 2216 1854 1824 1810
20 5k 2550 2508 2558 2417 2388 2357 1924 1928 1954
30 3333 2216 2272 2250 2333 2493 2496 1993 2009 2008
40 2.5K 2179 2221 2250 2568 2535 2500 2025 2048 2018
50 2K 2217 2213 2213 2487 2449 2604 2112 2016 2023

Despite the fact the results don't match my expectation
(I suspect that there's something wrong with the PERC
because, having the controller cache enabled make no
difference in terms of TPS), it seems strange that disabling
HT from the bios will give lesser TPS that HT disable through
sysfs interface.

OS conf:

vm.swappiness=0
vm.overcommit_memory=2
vm.dirty_ratio=2
vm.dirty_background_ratio=1
kernel.shmmax=3454820352
kernel.shmall=2048341
/sbin/blockdev --setra 8192 /dev/sdb
$PGDATA is on ext4 (rw,noatime)
Linux cloud 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
sdb scheduler is [cfq]

DB conf:

max_connections = 100
shared_buffers = 3200MB
work_mem = 30MB
maintenance_work_mem = 800MB
synchronous_commit = off
full_page_writes = off
checkpoint_segments = 40
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9
random_page_cost = 3.5
effective_cache_size = 10GB
log_autovacuum_min_duration = 0
autovacuum_naptime = 5min

Andrea

p.s. as last try in the process of increasing TPS
I've change the scheduler from cfq to deadline
and for -c 5 t 20K I've got r1=3007, r2=2930 and r3=2985.


From: Craig James <cjames(at)emolecules(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:01:08
Message-ID: CAFwQ8rfqFaSWdjAK5vQoJew6Usb6Ofn0BTy+MCxgMPizfrRpgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 15, 2012 at 1:27 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> On 10/11/2012 04:40 PM, Andrea Suisani wrote:
>>
>> On 10/11/2012 04:19 PM, Claudio Freire wrote:
>>>
>>> On Thu, Oct 11, 2012 at 11:14 AM, Andrea Suisani <sickpig(at)opinioni(dot)net>
>>> wrote:
>>>>
>>>> sorry to come late to the party, but being in a similar condition
>>>> I've googled a bit and I've found a way to disable hyperthreading
>>>> without
>>>> the need to reboot the system and entering the bios:
>>>>
>>>> echo 0 >/sys/devices/system/node/node0/cpuX/online
>>>>
>>>> where X belongs to 1..(#cores * 2) if hyperthreading is enabled
>>>> (cpu0 can't be switched off).
>>>>
>>>> didn't try myself on live system, but I definitely will
>>>> as soon as I have a new machine to test.
>>>
>>>
>>> Question is... will that remove the performance penalty of
>>> HyperThreading?
>>
>>
>> So I've added to my todo list to perform a test to verify this claim :)
>
>
> done.
>
> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
> of 512 MB).
>
> Postgres ver 9.2.1 (sorry for not having benchmarked 9.1,
> but this what we plan to deploy in production). Both the OS
> (Ubuntu 12.04.1) and Postgres had been briefly tuned according
> to the usal standards while trying to mimic Craig's configuration
> (see specific settings at the bottom).
>
> TPS including connection establishing, pgbench run in a single
> thread mode, connection made through unix socket, OS cache dropped
> and Postgres restarted for every run.
>
> those are the results:
>
> HT HT SYSFS DIS HT BIOS DISABLE
> -c -t r1 r2 r3 r1 r2 r3 r1 r2 r3
> 5 20K 1641 1831 1496 2020 1974 2033 2005 1988 1967
> 10 10K 2161 2134 2136 2277 2252 2216 1854 1824 1810
> 20 5k 2550 2508 2558 2417 2388 2357 1924 1928 1954
> 30 3333 2216 2272 2250 2333 2493 2496 1993 2009 2008
> 40 2.5K 2179 2221 2250 2568 2535 2500 2025 2048 2018
> 50 2K 2217 2213 2213 2487 2449 2604 2112 2016 2023
>
> Despite the fact the results don't match my expectation

You have a RAID1 with 15K SAS disks. I have a RAID10 with 8 7200 SATA
disks plus another RAID1 for the XLOG file system. Ten 7K SATA disks
on two file systems should be quite a bit faster than two 15K SAS
disks, right?

> (I suspect that there's something wrong with the PERC
> because, having the controller cache enabled make no
> difference in terms of TPS), it seems strange that disabling
> HT from the bios will give lesser TPS that HT disable through
> sysfs interface.

Well, all I can say is that I like my 3WARE controllers, and it's the
secondary reason why I moved away from Dell (the primary reason is
price).

Craig

>
> OS conf:
>
> vm.swappiness=0
> vm.overcommit_memory=2
> vm.dirty_ratio=2
> vm.dirty_background_ratio=1
> kernel.shmmax=3454820352
> kernel.shmall=2048341
> /sbin/blockdev --setra 8192 /dev/sdb
> $PGDATA is on ext4 (rw,noatime)
> Linux cloud 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012
> x86_64 x86_64 x86_64 GNU/Linux
> sdb scheduler is [cfq]
>
> DB conf:
>
> max_connections = 100
> shared_buffers = 3200MB
> work_mem = 30MB
> maintenance_work_mem = 800MB
> synchronous_commit = off
> full_page_writes = off
> checkpoint_segments = 40
> checkpoint_timeout = 5min
> checkpoint_completion_target = 0.9
> random_page_cost = 3.5
> effective_cache_size = 10GB
> log_autovacuum_min_duration = 0
> autovacuum_naptime = 5min
>
>
> Andrea
>
> p.s. as last try in the process of increasing TPS
> I've change the scheduler from cfq to deadline
> and for -c 5 t 20K I've got r1=3007, r2=2930 and r3=2985.
>
>
>


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:01:37
Message-ID: CAGTBQpaHGxo9iFWoTJhuU+eTOK13Q-YkQk+9QC7xuT5HDmbv6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 15, 2012 at 5:27 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> it seems strange that disabling
> HT from the bios will give lesser TPS that HT disable through
> sysfs interface.

It does prove they're not equivalent though.


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:24:58
Message-ID: 507C2ACA.3020503@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/15/2012 05:01 PM, Claudio Freire wrote:
> On Mon, Oct 15, 2012 at 5:27 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> it seems strange that disabling
>> HT from the bios will give lesser TPS that HT disable through
>> sysfs interface.
>
> It does prove they're not equivalent though.
>

sure you're right.

It's just that my bet was on a higher throughput
when HT was isabled from the BIOS (as you stated
previously in this thread).

Andrea


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:28:45
Message-ID: CAGTBQpbt6sFkbGqELBzesTuz=v=iY3NQPdf+_AppfN7KNzkJEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> It does prove they're not equivalent though.
>>
>
> sure you're right.
>
> It's just that my bet was on a higher throughput
> when HT was isabled from the BIOS (as you stated
> previously in this thread).

Yes, mine too. It's bizarre. If I were you, I'd look into it more
deeply. It may be a flaw in your test methodology (maybe you disabled
the wrong cores?). If not, it would be good to know why the extra TPS
to replicate elsewhere.


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Andrea Suisani <sickpig(at)opinioni(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:32:45
Message-ID: CAOR=d=1musjC9ZOmNWTvyFAv6i4vvLHZcWQzUM3pmdRVVEao6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 15, 2012 at 9:01 AM, Craig James <cjames(at)emolecules(dot)com> wrote:
> On Mon, Oct 15, 2012 at 1:27 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> (I suspect that there's something wrong with the PERC
>> because, having the controller cache enabled make no
>> difference in terms of TPS), it seems strange that disabling
>> HT from the bios will give lesser TPS that HT disable through
>> sysfs interface.
>
> Well, all I can say is that I like my 3WARE controllers, and it's the
> secondary reason why I moved away from Dell (the primary reason is
> price).

Mediocre performance, random lockups, and Dell's refusal to address
said lockups are the reasons I abandoned Dell's PERC controllers. My
preference is Areca 1680/1880, then 3Ware 96xx, then LSI, then
Adaptec. Areca's web interface on a dedicated ethernet port make them
super easy to configure while the machine is running with no need for
specialized software for a given OS, and they're performance and
reliability are great. The 3Wares are very solid with later model
BIOS on board. LSI gets a rasberry for MegaCLI, the 2nd klunkiest
interface ever, the worst being their horrible horrible BIOS boot
setup screen.


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Andrea Suisani <sickpig(at)opinioni(dot)net>, Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:34:39
Message-ID: CAOR=d=02vBk24Mt7cJd9SBxjAcL26Ao=N_8o7ZxB-qe0VW1T8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> sure you're right.
>>
>> It's just that my bet was on a higher throughput
>> when HT was isabled from the BIOS (as you stated
>> previously in this thread).
>
> Yes, mine too. It's bizarre. If I were you, I'd look into it more
> deeply. It may be a flaw in your test methodology (maybe you disabled
> the wrong cores?). If not, it would be good to know why the extra TPS
> to replicate elsewhere.

I'd recommend more synthetic benchmarks when trying to compare systems
like this. bonnie++, the memory stream test that Greg Smith was
working on, and so on. Get an idea what core differences the machines
display under such testing.


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:45:24
Message-ID: 507C2F94.6040000@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[cut]

>> TPS including connection establishing, pgbench run in a single
>> thread mode, connection made through unix socket, OS cache dropped
>> and Postgres restarted for every run.
>>
>> those are the results:
>>
>> HT HT SYSFS DIS HT BIOS DISABLE
>> -c -t r1 r2 r3 r1 r2 r3 r1 r2 r3
>> 5 20K 1641 1831 1496 2020 1974 2033 2005 1988 1967
>> 10 10K 2161 2134 2136 2277 2252 2216 1854 1824 1810
>> 20 5k 2550 2508 2558 2417 2388 2357 1924 1928 1954
>> 30 3333 2216 2272 2250 2333 2493 2496 1993 2009 2008
>> 40 2.5K 2179 2221 2250 2568 2535 2500 2025 2048 2018
>> 50 2K 2217 2213 2213 2487 2449 2604 2112 2016 2023
>>
>> Despite the fact the results don't match my expectation
>
> You have a RAID1 with 15K SAS disks. I have a RAID10 with 8 7200 SATA
> disks plus another RAID1 for the XLOG file system. Ten 7K SATA disks
> on two file systems should be quite a bit faster than two 15K SAS
> disks, right?

I think you're right. But I never have the chance to try such
a configuration in first person. But, yes, spreading I/O on two
different subsystems (xlog and pgdata) and having pgdata on
a RAID10 should surely outperform my RAID1 with 15K SAS disks.

>> (I suspect that there's something wrong with the PERC
>> because, having the controller cache enabled make no
>> difference in terms of TPS), it seems strange that disabling
>> HT from the bios will give lesser TPS that HT disable through
>> sysfs interface.
>
> Well, all I can say is that I like my 3WARE controllers, and it's the
> secondary reason why I moved away from Dell (the primary reason is
> price).

Something I surely will take into account the next time
I will buy a new server.

Andrea


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:56:07
Message-ID: 507C3217.6040905@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/15/2012 05:28 PM, Claudio Freire wrote:
> On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>>> It does prove they're not equivalent though.
>>>
>>
>> sure you're right.
>>
>> It's just that my bet was on a higher throughput
>> when HT was isabled from the BIOS (as you stated
>> previously in this thread).
>
> Yes, mine too. It's bizarre. If I were you, I'd look into it more
> deeply. It may be a flaw in your test methodology (maybe you disabled
> the wrong cores?).

this is the first thing I thought after looking at the results
but I've double-checked cores topology (core_id, core_siblings_list end
friends under /sys/devices/system/cpu/cpu0/topology) and I seems
to me that I've disabled the right ones.

It could be that I've messed up with something else...

> If not, it would be good to know why the extra TPS
> to replicate elsewhere.

definitely I will try to understand the
probable causes performing other tests...
any hints are welcome :)

>


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-15 15:56:44
Message-ID: 507C323C.2090805@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/15/2012 05:34 PM, Scott Marlowe wrote:
> On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>>> sure you're right.
>>>
>>> It's just that my bet was on a higher throughput
>>> when HT was isabled from the BIOS (as you stated
>>> previously in this thread).
>>
>> Yes, mine too. It's bizarre. If I were you, I'd look into it more
>> deeply. It may be a flaw in your test methodology (maybe you disabled
>> the wrong cores?). If not, it would be good to know why the extra TPS
>> to replicate elsewhere.
>
> I'd recommend more synthetic benchmarks when trying to compare systems
> like this. bonnie++, the memory stream test that Greg Smith was
> working on, and so on. Get an idea what core differences the machines
> display under such testing.
>

Will try tomorrow
thanks for the hint

Andrea


From: Marinos Yannikos <mjy(at)geizhals(dot)at>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-16 05:07:14
Message-ID: 507CEB82.8040203@geizhals.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 15.10.2012 17:01, Craig James wrote:
>>>> On Thu, Oct 11, 2012 at 11:14 AM, Andrea Suisani <sickpig(at)opinioni(dot)net>
>>>> wrote:
>>>>> I've googled a bit and I've found a way to disable hyperthreading
>>>>> without
>>>>> the need to reboot the system and entering the bios:
>>>>>
>>>>> echo 0 >/sys/devices/system/node/node0/cpuX/online

A safer method is probably to just add the "noht" kernel boot option and
reboot.

Did you set the same stride / stripe-width values on your FS when you
initialized them? Are both really freshly-made ext4 FS and not e.g. the
old one an ext3 mounted as ext4? Do all the disks have the same cache,
link speed and NCQ settings (for their own caches, not the controller;
try /c0/p0 show all etc. with tw_cli)?

-mjy


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-17 15:45:23
Message-ID: 507ED293.6010803@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/15/2012 05:34 PM, Scott Marlowe wrote:
> On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>>> sure you're right.
>>>
>>> It's just that my bet was on a higher throughput
>>> when HT was isabled from the BIOS (as you stated
>>> previously in this thread).
>>
>> Yes, mine too. It's bizarre. If I were you, I'd look into it more
>> deeply. It may be a flaw in your test methodology (maybe you disabled
>> the wrong cores?). If not, it would be good to know why the extra TPS
>> to replicate elsewhere.
>
> I'd recommend more synthetic benchmarks when trying to compare systems
> like this. bonnie++,

you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if any)
difference in terms of sequential input whether or not cache is enabled on the
RAID1 (SAS 15K, sdb).

I've run 2 bonnie++ test with both cache enabled and disabled and what I get
(see attachments for more details) it's a 400MB/s sequential input (cache) vs
390MBs (nocache).

I dunno why but I would have expected a higher delta (due to the 512MB cache)
not a mere 10MB/s, but this is only based on my gut feeling.

I've also tried to test RAID1 array where the OS is installed (2 SATA 7.2Krpm, sda)
just to verify if cache effect is comparable with the one I get from SAS disks.

Well it seems that there's no cache effects or if it's is there is so small as to be
confused with the noise.

Both array are configured with this params

Read Policy : Adaptive Read Ahead
Write Policy : Write Back
Stripe Element Size : 64 KB
Disk Cache Policy : Disabled

those tests are performed with HT disable from the BIOS, but without
using noht kernel boot param. the scheduler for sdb was setted to deadline
while the default cfq for sda.

> the memory stream test that Greg Smith was
> working on, and so on.

this one https://github.com/gregs1104/stream-scaling, right?

I've executed the test with HT enabled, HT disabled from the BIOS
and HT disable using sys interface. Attached 3 graphs and related
text files

> Get an idea what core differences the machines
> display under such testing.

I'm trying... hard :)

Andrea

Attachment Content-Type Size
bonnie_sdb_cache_wo_pgsql text/plain 533 bytes
bonnie_sdb_cache_wo_pgsql_2 text/plain 535 bytes
bonnie_sdb_nocache_wo_pgsql text/plain 534 bytes
bonnie_sdb_nocache_wo_pgsql_2 text/plain 536 bytes
image/png 3.3 KB
stream-ht_disabled_bios.txt text/plain 4.5 KB
image/png 3.3 KB
stream-ht_disabled_sysfs.txt text/plain 1.8 KB
image/png 3.6 KB
stream-ht_enabled.txt text/plain 6.6 KB

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-17 16:35:05
Message-ID: CAOR=d=0s7WZmb3AmQxYvKq6BKgszXA=avV3S-5b0dLe0KK5raw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Oct 17, 2012 at 9:45 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> On 10/15/2012 05:34 PM, Scott Marlowe wrote:
>> I'd recommend more synthetic benchmarks when trying to compare systems
>> like this. bonnie++,
>
>
> you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if
> any)
> difference in terms of sequential input whether or not cache is enabled on
> the
> RAID1 (SAS 15K, sdb).

I'm mainly wanting to know the difference between the two systems, so
if you can run it on the old and new machine and compare that that's
the real test.

> I've run 2 bonnie++ test with both cache enabled and disabled and what I get
> (see attachments for more details) it's a 400MB/s sequential input (cache)
> vs
> 390MBs (nocache).
>
> I dunno why but I would have expected a higher delta (due to the 512MB
> cache)
> not a mere 10MB/s, but this is only based on my gut feeling.

Well the sequential throughput doesn't really rely on caching. It's
the random writes that benefit from caching, and the other things
(random reads and seq read/write) that indirectly benefit because the
random writes are so much faster that they no longer get in the way.
So mostly compare random access between the old and new machines and
look for differences there.
>> the memory stream test that Greg Smith was
>> working on, and so on.
>
>
> this one https://github.com/gregs1104/stream-scaling, right?

Yep.

> I've executed the test with HT enabled, HT disabled from the BIOS
> and HT disable using sys interface. Attached 3 graphs and related
> text files

Well it's pretty meh. I'd like to see the older machine compared to
the newer one here tho.

> I'm trying... hard :)

You're doing great. These problems take effort to sort out.


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig James <cjames(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-18 06:57:12
Message-ID: 507FA848.7040801@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/17/2012 06:35 PM, Scott Marlowe wrote:
> On Wed, Oct 17, 2012 at 9:45 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
>> On 10/15/2012 05:34 PM, Scott Marlowe wrote:
>>> I'd recommend more synthetic benchmarks when trying to compare systems
>>> like this. bonnie++,
>>
>>
>> you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if
>> any)
>> difference in terms of sequential input whether or not cache is enabled on
>> the
>> RAID1 (SAS 15K, sdb).

Maybe there's a misunderstanding here.. :) Craig (James) is the one
the had started this thread. I've joined later suggesting a way to
disable HT without rebooting (using sysfs interface), trying to avoid
a trip to the data-center to Craig.

At that point Claudio Freire wondering if disabling HT from sysfs
would have removed the performance penalty that Craig has experienced.

So I decided to test this on a brand new box that I've just bought.

When performing this test I've discovered by chance that
the raid controller (PERC H710) behave in an unexpected way,
cause the hw cache has almost no effect in terms of TPS in
a pgbench session.

> I'm mainly wanting to know the difference between the two systems, so
> if you can run it on the old and new machine and compare that that's
> the real test.

This is something that Craig can do.

[cut]

>> I dunno why but I would have expected a higher delta (due to the 512MB
>> cache)
>> not a mere 10MB/s, but this is only based on my gut feeling.
>
> Well the sequential throughput doesn't really rely on caching. It's
> the random writes that benefit from caching, and the other things
> (random reads and seq read/write) that indirectly benefit because the
> random writes are so much faster that they no longer get in the way.
> So mostly compare random access between the old and new machines and
> look for differences there.

make sense.

I will focus on tests that measure random path access.

>>> the memory stream test that Greg Smith was
>>> working on, and so on.
>>
>>
>> this one https://github.com/gregs1104/stream-scaling, right?
>
> Yep.
>
>> I've executed the test with HT enabled, HT disabled from the BIOS
>> and HT disable using sys interface. Attached 3 graphs and related
>> text files
>
> Well it's pretty meh.

:/

do you think that Xeon Xeon 5620 perform poorly ?

> I'd like to see the older machine compared to
> the newer one here tho.

also this one is on Craig side.

>> I'm trying... hard :)
>
> You're doing great. These problems take effort to sort out.

thanks


From: Craig James <cjames(at)emolecules(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-18 16:39:45
Message-ID: CAFwQ8rcW744OLOF5FX4=+Q3rizXvPaX0w8POcWp0qsvD6vCBLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Oct 17, 2012 at 11:57 PM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> On 10/17/2012 06:35 PM, Scott Marlowe wrote:
>>
>> On Wed, Oct 17, 2012 at 9:45 AM, Andrea Suisani <sickpig(at)opinioni(dot)net>
>> wrote:
>>>
>>> On 10/15/2012 05:34 PM, Scott Marlowe wrote:
>>>>
>>>> I'd recommend more synthetic benchmarks when trying to compare systems
>>>> like this. bonnie++,
>>>
>>>
>>>
>>> you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if
>>> any)
>>> difference in terms of sequential input whether or not cache is enabled
>>> on
>>> the
>>> RAID1 (SAS 15K, sdb).
>
>
> Maybe there's a misunderstanding here.. :) Craig (James) is the one
> the had started this thread. I've joined later suggesting a way to
> disable HT without rebooting (using sysfs interface), trying to avoid
> a trip to the data-center to Craig.
>
> At that point Claudio Freire wondering if disabling HT from sysfs
> would have removed the performance penalty that Craig has experienced.
>
> So I decided to test this on a brand new box that I've just bought.
>
> When performing this test I've discovered by chance that
> the raid controller (PERC H710) behave in an unexpected way,
> cause the hw cache has almost no effect in terms of TPS in
> a pgbench session.
>
>> I'm mainly wanting to know the difference between the two systems, so
>> if you can run it on the old and new machine and compare that that's
>> the real test.
>
>
> This is something that Craig can do.

Too late ... the new machine is in production.

Craig

>
> [cut]
>
>>> I dunno why but I would have expected a higher delta (due to the 512MB
>>> cache)
>>> not a mere 10MB/s, but this is only based on my gut feeling.
>
>>
>>
>> Well the sequential throughput doesn't really rely on caching. It's
>> the random writes that benefit from caching, and the other things
>> (random reads and seq read/write) that indirectly benefit because the
>> random writes are so much faster that they no longer get in the way.
>> So mostly compare random access between the old and new machines and
>> look for differences there.
>
>
> make sense.
>
> I will focus on tests that measure random path access.
>
>>>> the memory stream test that Greg Smith was
>>>> working on, and so on.
>>>
>>>
>>>
>>> this one https://github.com/gregs1104/stream-scaling, right?
>>
>>
>> Yep.
>>
>>> I've executed the test with HT enabled, HT disabled from the BIOS
>>> and HT disable using sys interface. Attached 3 graphs and related
>>> text files
>>
>>
>> Well it's pretty meh.
>
>
> :/
>
> do you think that Xeon Xeon 5620 perform poorly ?
>
>> I'd like to see the older machine compared to
>> the newer one here tho.
>
>
> also this one is on Craig side.
>
>>> I'm trying... hard :)
>>
>>
>> You're doing great. These problems take effort to sort out.
>
>
> thanks
>
>


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-05 15:34:24
Message-ID: 50BF6980.3090800@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[sorry for resuming an old thread]

[cut]

>>> Question is... will that remove the performance penalty of HyperThreading?
>>
>> So I've added to my todo list to perform a test to verify this claim :)
>
> done.

on this box:

> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
> of 512 MB). (ubuntu 12.04)

with postgres 9.2.1 and $PGDATA on a ext4 formatted partition
i've got:

> those are the results:
>
> HT HT SYSFS DIS HT BIOS DISABLE
> -c -t r1 r2 r3 r1 r2 r3 r1 r2 r3
> 5 20K 1641 1831 1496 2020 1974 2033 2005 1988 1967
> 10 10K 2161 2134 2136 2277 2252 2216 1854 1824 1810
> 20 5k 2550 2508 2558 2417 2388 2357 1924 1928 1954
> 30 3333 2216 2272 2250 2333 2493 2496 1993 2009 2008
> 40 2.5K 2179 2221 2250 2568 2535 2500 2025 2048 2018
> 50 2K 2217 2213 2213 2487 2449 2604 2112 2016 2023

on the same machine with the same configuration,
having PGDATA on a xfs formatted partition gives me
a much better TPS.

e.g. pgbench -c 20 -t 5000 gives me 6305 TPS
(3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart"
in between).

Anybody else have experienced this kind of differences
between etx4 and xfs?

Andrea


From: Jean-David Beyer <jeandavid8(at)verizon(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-05 16:51:08
Message-ID: 50BF7B7C.9000204@verizon.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 12/05/2012 10:34 AM, Andrea Suisani wrote:
> [sorry for resuming an old thread]
>
> [cut]
>
>>>> Question is... will that remove the performance penalty of
>>>> HyperThreading?
>>>
>>> So I've added to my todo list to perform a test to verify this claim :)
>>
>> done.
>
> on this box:
>
>> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
>> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
>> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
>> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
>> of 512 MB). (ubuntu 12.04)
>
> with postgres 9.2.1 and $PGDATA on a ext4 formatted partition
> i've got:
>
>> those are the results:
>>
>> HT HT SYSFS DIS HT BIOS DISABLE
>> -c -t r1 r2 r3 r1 r2 r3 r1 r2 r3
>> 5 20K 1641 1831 1496 2020 1974 2033 2005 1988 1967
>> 10 10K 2161 2134 2136 2277 2252 2216 1854 1824 1810
>> 20 5k 2550 2508 2558 2417 2388 2357 1924 1928 1954
>> 30 3333 2216 2272 2250 2333 2493 2496 1993 2009 2008
>> 40 2.5K 2179 2221 2250 2568 2535 2500 2025 2048 2018
>> 50 2K 2217 2213 2213 2487 2449 2604 2112 2016 2023
>
> on the same machine with the same configuration,
> having PGDATA on a xfs formatted partition gives me
> a much better TPS.
>
> e.g. pgbench -c 20 -t 5000 gives me 6305 TPS
> (3 runs with "echo 3 > /proc/sys/vm/drop_caches &&
> /etc/init.d/postgresql-9.2 restart"
> in between).
>
> Anybody else have experienced this kind of differences
> between etx4 and xfs?
>
> Andrea
>
>
>
I thought that postgreSQL did its own journalling, if that is the proper
term, so why not use an ext2 file system to lower overhead?


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Jean-David Beyer <jeandavid8(at)verizon(dot)net>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-05 16:56:32
Message-ID: CAGTBQpYtN3a=ryQOcGyZPNYz4Ts=Fbj395thp7Ss3GHQqSQi3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Dec 5, 2012 at 1:51 PM, Jean-David Beyer <jeandavid8(at)verizon(dot)net> wrote:
> I thought that postgreSQL did its own journalling, if that is the proper
> term, so why not use an ext2 file system to lower overhead?

Because you can still have metadata-level corruption.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jean-David Beyer <jeandavid8(at)verizon(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-05 17:00:56
Message-ID: 50BF7DC8.7040702@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On 12/05/2012 11:51 AM, Jean-David Beyer wrote:
>>
>>
> I thought that postgreSQL did its own journalling, if that is the
> proper term, so why not use an ext2 file system to lower overhead?

Postgres journalling will not save you from a corrupt file system.

cheers

andrew


From: John Lister <john(dot)lister(at)kickstone(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-06 08:29:46
Message-ID: 50C0577A.6070401@kickstone.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> on this box:
>
>> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
>> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
>> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
>> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
>> of 512 MB). (ubuntu 12.04)
>
> on the same machine with the same configuration,
> having PGDATA on a xfs formatted partition gives me
> a much better TPS.
>
> e.g. pgbench -c 20 -t 5000 gives me 6305 TPS
> (3 runs with "echo 3 > /proc/sys/vm/drop_caches &&
> /etc/init.d/postgresql-9.2 restart"
> in between).
Hi, I found this interesting as I'm trying to do some benchmarks on my
box which is very similar to the above but I don't believe the tps is
any where near what it should be. Is the 6305 figure from xfs? I'm
assuming that your main data array is just 2 15k sas drives, are you
putting the WAL on the data array or is that stored somewhere else? Can
I ask what scaling params, etc you used to build the pgbench tables and
look at your postgresql.conf file to see if I missed something (offline
if you wish)

I'm running 8x SSDs in RAID 10 for the data and pull just under 10k on a
xfs system which is much lower than I'd expect for that setup and isn't
significantly greater than your reported results, so something must be
very wrong.

Thanks

John


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: John Lister <john(dot)lister(at)kickstone(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-06 08:44:32
Message-ID: 50C05AF0.1050306@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi John,

On 12/06/2012 09:29 AM, John Lister wrote:
>
>> on this box:
>>
>>> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
>>> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
>>> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
>>> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
>>> of 512 MB). (ubuntu 12.04)
>>
>> on the same machine with the same configuration,
>> having PGDATA on a xfs formatted partition gives me
>> a much better TPS.
>>
>> e.g. pgbench -c 20 -t 5000 gives me 6305 TPS
>> (3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart"
>> in between).

> Hi, I found this interesting as I'm trying to do some benchmarks on my box which is
> very similar to the above but I don't believe the tps is any where near what it should be.
> Is the 6305 figure from xfs?

yes, it is.

> I'm assuming that your main data array is just 2 15k sas drives,

correct

> are you putting the WAL on the data array or is that stored somewhere else?

pg_xlog is placed in the data array.

> Can I ask what scaling params,

sure, I've initialized pgbench db issuing:

pgbench -i -s 10 pgbench

> etc you used to build the pgbench tables and look at your postgresql.conf file to see if I missed something (offline if you wish)

those are non default values in postgresql.conf

listen_addresses = '*'
max_connections = 100
shared_buffers = 3200MB
work_mem = 30MB
maintenance_work_mem = 800MB
synchronous_commit = off
full_page_writes = off
checkpoint_segments = 40
checkpoint_completion_target = 0.9
random_page_cost = 3.5
effective_cache_size = 10GB
log_timezone = 'localtime'
stats_temp_directory = 'pg_stat_tmp_ram'
autovacuum_naptime = 5min

and then OS tweaks:

HT bios disabled
/sbin/blockdev --setra 8192 /dev/sdb
echo deadline > /sys/block/sdb/queue/scheduler
vm.swappiness=0
vm.overcommit_memory=2
vm.dirty_ratio=2
vm.dirty_background_ratio=1
kernel.shmmax=3454820352
kernel.shmall=2048341
$PGDATA is on xfs (rw,noatime)
tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001)
kernel 3.2.0-32-generic

Andrea


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: John Lister <john(dot)lister(at)kickstone(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-06 09:33:06
Message-ID: 50C06652.2080902@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[added performance list back]

On 12/06/2012 10:04 AM, John Lister wrote:
> Thanks for the info, I'll have a play and see what values I get with similar settings, etc

you're welcome

> Still think something is wrong with my config, but we'll see.

which kind of ssd disks do you have ?
maybe they are of the same typeShaun Thomas is having problem with here:
http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php

Andrea

> john
>
> On 06/12/2012 08:44, Andrea Suisani wrote:
>> Hi John,
>>
>> On 12/06/2012 09:29 AM, John Lister wrote:
>>>
>>>> on this box:
>>>>
>>>>> in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
>>>>> the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
>>>>> (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
>>>>> (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
>>>>> of 512 MB). (ubuntu 12.04)
>>>>
>>>> on the same machine with the same configuration,
>>>> having PGDATA on a xfs formatted partition gives me
>>>> a much better TPS.
>>>>
>>>> e.g. pgbench -c 20 -t 5000 gives me 6305 TPS
>>>> (3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart"
>>>> in between).
>>
>>
>>> Hi, I found this interesting as I'm trying to do some benchmarks on my box which is
>> > very similar to the above but I don't believe the tps is any where near what it should be.
>> > Is the 6305 figure from xfs?
>>
>> yes, it is.
>>
>>> I'm assuming that your main data array is just 2 15k sas drives,
>>
>> correct
>>
>>> are you putting the WAL on the data array or is that stored somewhere else?
>>
>> pg_xlog is placed in the data array.
>>
>>> Can I ask what scaling params,
>>
>> sure, I've initialized pgbench db issuing:
>>
>> pgbench -i -s 10 pgbench
>>
>>> etc you used to build the pgbench tables and look at your postgresql.conf file to see if I missed something (offline if you wish)
>>
>> those are non default values in postgresql.conf
>>
>> listen_addresses = '*'
>> max_connections = 100
>> shared_buffers = 3200MB
>> work_mem = 30MB
>> maintenance_work_mem = 800MB
>> synchronous_commit = off
>> full_page_writes = off
>> checkpoint_segments = 40
>> checkpoint_completion_target = 0.9
>> random_page_cost = 3.5
>> effective_cache_size = 10GB
>> log_timezone = 'localtime'
>> stats_temp_directory = 'pg_stat_tmp_ram'
>> autovacuum_naptime = 5min
>>
>> and then OS tweaks:
>>
>> HT bios disabled
>> /sbin/blockdev --setra 8192 /dev/sdb
>> echo deadline > /sys/block/sdb/queue/scheduler
>> vm.swappiness=0
>> vm.overcommit_memory=2
>> vm.dirty_ratio=2
>> vm.dirty_background_ratio=1
>> kernel.shmmax=3454820352
>> kernel.shmall=2048341
>> $PGDATA is on xfs (rw,noatime)
>> tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001)
>> kernel 3.2.0-32-generic
>>
>>
>> Andrea
>>
>>
>
>


From: John Lister <john(dot)lister(at)kickstone(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-06 11:37:30
Message-ID: 50C0837A.6010602@kickstone.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 06/12/2012 09:33, Andrea Suisani wrote:
>
> which kind of ssd disks do you have ?
> maybe they are of the same typeShaun Thomas is having problem with here:
> http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php
Yeah i saw that post, I'm running the same version of ubuntu with the
3.2 kernel, so when I get a chance to take it down will try the new
kernels, although ubuntu are on 3.5 now... Shaun didn't post what
hardware he was running on, so it would be interesting to see how it
compares. They are intel 320s, which while not the newest should offer
some protection against power failure, etc

John


From: Andrea Suisani <sickpig(at)opinioni(dot)net>
To: John Lister <john(dot)lister(at)kickstone(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Date: 2012-12-06 12:53:23
Message-ID: 50C09543.4060901@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 12/06/2012 12:37 PM, John Lister wrote:
> On 06/12/2012 09:33, Andrea Suisani wrote:
>>
>> which kind of ssd disks do you have ?
>> maybe they are of the same typeShaun Thomas is having problem with here:
>> http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php
> Yeah i saw that post, I'm running the same version of ubuntu with the 3.2 kernel, so when I get a chance to take it down will try the new kernels, although ubuntu are on 3.5 now... Shaun didn't post what hardware he was running on, so it would be interesting to see how it compares. They are intel
> 320s, which while not the newest should offer some protection against power failure, etc

reading again the thread I realized Shaun is using
fusionIO driver and he said that the regression is due
to "some recent 3.2 kernel patch borks the driver in
some horrible way".

so maybe you're not on the same boat (since you're
using intel 320), or maybe the kernel regression
he's referring to is related to the kernel subsystem
that deal with ssd disks independently from brands.
In the latter case testing a different kernel would be worthy.

Andrea