Re: 9.3 regression with dbt2

Lists: pgsql-hackers
From: Dong Ye <yed(at)vmware(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: 9.3 regression with dbt2
Date: 2013-12-18 18:24:56
Message-ID: d54b8387.000012d8.00000010@YED-DEVD1.vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

We recently observed ~15% performance regression with dbt2 from PG 9.3.

We narrowed down on testing master between 9.2 cut and 9.3 cut.

It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit
commit.

We did several runs and perf profiling comparing it against its parent
(f925c79b9f36c54b67053ade5ad225a75b8dc803).

Also tested a 12/16 commit on the master
(3b97e6823b949624afdc3ce4c92b29a80429715f) once, it performed similarly as
0ac..

Regards,

Dong

Results:

f92: 53k-56k'ish notpm

0ac: 47k-48k'ish notpm

Server SUT:

HP ML350 G6

Two Xeon E5520 (4c/p, 8 cores total, hyper-threading disabled)

12GB DRAM

HP P410i RAID controller (256MB battery-backed cache)

- three 10k-rpm SAS: /

- three 10k-rpm SAS: /pgdata

- one 15k-rpm SAS: /pgxlog

- ext4 (rw,relatime,data=ordered) on all mounts.

Fedora 19 (3.11.10-200.fc19.x86_64)

max_connections=100

shared_buffers=8192MB

effective_cache_size=10GB

temp_buffers=8186kB

work_mem=4093kB

maintenance_work_mem=399MB

wal_buffers=-1

checkpoint_segments=300

checkpoint_completion_target=0.9

logging_collector=on

log_timezone=UTC

datestyle='iso, mdy'

lc_messages=C

lc_monetary=C

lc_numeric=C

lc_time=C

default_text_search_config='pg_catalog.english'

listen_addresses='*'

log_destination=csvlog

log_directory=pg_log

log_filename='pg-%a'

log_rotation_age=1440

log_truncate_on_rotation=on

Client and workload:

Dell 390. Two core. Direct connect with the Server SUT.

dbt2 (ToT)

40 warehouse

8 terminals, 8 connections

zero think/key time

12-min run

Flat perf profiles of two such runs look like:

f92:

Samples: 608K of event 'cycles', Event count (approx.): 6679607097416

+ 4.04% postgres postgres [.]
heap_hot_search_buffer

+ 3.63% postgres postgres [.] AllocSetAlloc

+ 3.37% postgres postgres [.]
hash_search_with_hash_value

+ 2.85% postgres postgres [.] _bt_compare

+ 2.67% postgres postgres [.] SearchCatCache

+ 2.46% postgres postgres [.] LWLockAcquire

+ 2.16% postgres postgres [.] XLogInsert

+ 2.08% postgres postgres [.] PinBuffer

+ 1.32% postgres postgres [.] ExecInitExpr

+ 1.31% postgres libc-2.17.so [.] _int_malloc

+ 1.29% swapper [kernel.kallsyms] [k] intel_idle

+ 1.23% postgres postgres [.]
MemoryContextAllocZeroAligned

+ 1.13% postgres postgres [.]
heap_page_prune_opt

+ 1.06% postgres libc-2.17.so [.]
__memcpy_ssse3_back

+ 1.02% postgres postgres [.] LWLockRelease

+ 0.94% postgres postgres [.] copyObject

+ 0.89% postgres postgres [.]
fmgr_info_cxt_security

+ 0.82% postgres postgres [.] _bt_checkkeys

+ 0.81% postgres postgres [.] hash_any

+ 0.73% postgres postgres [.] FunctionCall2Coll

+ 0.69% postgres libc-2.17.so [.]
__strncpy_sse2_unaligned

+ 0.67% postgres postgres [.]
HeapTupleSatisfiesMVCC

+ 0.66% postgres postgres [.] MemoryContextAlloc

+ 0.65% postgres postgres [.]
expression_tree_walker

+ 0.59% postgres postgres [.] check_stack_depth

+ 0.57% postgres libc-2.17.so [.] __printf_fp

+ 0.56% postgres libc-2.17.so [.] _int_free

+ 0.52% postgres postgres [.] base_yyparse

0ac:

Samples: 706K of event 'cycles', Event count (approx.): 6690377376522

+ 3.82% postgres postgres [.]
GetMultiXactIdMembers

+ 3.43% postgres postgres [.] LWLockAcquire

+ 3.31% postgres postgres [.]
hash_search_with_hash_value

+ 3.09% postgres postgres [.]
heap_hot_search_buffer

+ 3.00% postgres postgres [.] AllocSetAlloc

+ 2.56% postgres postgres [.] _bt_compare

+ 2.19% postgres postgres [.] PinBuffer

+ 2.13% postgres postgres [.] SearchCatCache

+ 1.99% postgres postgres [.] XLogInsert

+ 1.53% postgres postgres [.] LWLockRelease

+ 1.32% postgres postgres [.]
HeapTupleSatisfiesMVCC

+ 1.31% swapper [kernel.kallsyms] [k] intel_idle

+ 1.25% postgres postgres [.] ExecInitExpr

+ 1.21% postgres postgres [.]
heap_page_prune_opt

+ 1.03% postgres libc-2.17.so [.] _int_malloc

+ 0.99% postgres postgres [.]
MemoryContextAllocZeroAligned

+ 0.97% postgres libc-2.17.so [.]
__memcpy_ssse3_back

+ 0.89% postgres postgres [.] heapgetpage

+ 0.76% postgres postgres [.] hash_any

+ 0.69% postgres postgres [.]
fmgr_info_cxt_security

+ 0.69% postgres postgres [.] _bt_checkkeys

+ 0.65% postgres postgres [.] copyObject

+ 0.63% postgres postgres [.]
FunctionCall2Coll

+ 0.58% postgres postgres [.]
MemoryContextAlloc

+ 0.56% postgres libc-2.17.so [.]
__strncpy_sse2_unaligned

+ 0.53% postgres postgres [.]
XidInMVCCSnapshot

+ 0.52% postgres [kernel.kallsyms] [k]
copy_user_generic_string

+ 0.51% postgres libc-2.17.so [.] __printf_fp

+ 0.50% postgres libc-2.17.so [.] vfprintf

P.S.

The regression was found when testing vPostgres performance.

During the narrow-down, we found the commit
005f583ba4e6d4d19b62959ef8e70a3da4d188a5 (on REL9_2_STABLE) improves
performance (versus its parent commit) by 5-10%. Perhaps serve an extra
data point for the patch.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Dong Ye <yed(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 19:17:44
Message-ID: 20131218191744.GB26481@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

On 2013-12-18 10:24:56 -0800, Dong Ye wrote:
> It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit
> commit.

How long does a run take to verify the problem? Could you retry with the
patch attached to
http://www.postgresql.org/message-id/20131201114514.GG18793@alap2.anarazel.de
? Based on the theory that it creates many superflous multixacts.

> Flat perf profiles of two such runs look like:

Those aren't really flat profiles tho ;)

> 0ac:
>
> Samples: 706K of event 'cycles', Event count (approx.): 6690377376522
>
>
> + 3.82% postgres postgres [.]
> GetMultiXactIdMembers

Could you expland that one some levels, so we see the callers?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Dong Ye <yed(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 19:59:45
Message-ID: CAEHKxOFs4eQzoVeknC+z5K8sP08_pS4Y7bEA=KZO_U-WE16BCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

~20 minutes each run with binary.
Try your patch now..
You are right I used -g in perf record. But what I reported was flat (meant
as a start).

Expand GetMultiXactIdMembers:

3.82% postgres postgres [.]
GetMultiXactIdMembers
|
|--9.09%-- GetMultiXactIdMembers
|
|--0.84%-- 0x48fb894853f58948
| |
| |--0.74%-- 0x4296e0004296c
| | GetMultiXactIdMembers
| |
| |--0.64%-- 0x52f8d00052f8d
| | GetMultiXactIdMembers
| |
| |--0.64%-- 0xf6ce8000f6ce8
| | GetMultiXactIdMembers
| |
| |--0.62%-- 0x41de300041de1
| | GetMultiXactIdMembers
| |
| |--0.61%-- 0xf2c77000f2c71
| | GetMultiXactIdMembers
| |
| |--0.60%-- 0x3127700031275
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0x10c98b0010c987
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0x31df000031df0
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0xbefbd000befbd
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0xfe97c000fe976
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x82501000824f9
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x3a4410003a43c
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x3b0cf0003b0c3
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x5325f0005325b
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x7b6b80007b6b8
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x52e9b00052e9b
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0xf3d45000f3d40
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x27afd00027afa
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x3244d0003244d
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x53e0d00053e06
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0xb64c6000b64bc
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x423f1000423ef
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0xc18f2000c18ed
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x6bdcf0006bdcd
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xc6d25000c6d25
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xf6534000f6534
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x10bba80010bba0
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xb5a76000b5a6e
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x2d3c10002d3b5
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xcc095000cc095
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x2e3340002e334
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0xafffc000afff9
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x68caf00068caf
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x5ec090005ec09
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x51eb100051eac
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x5402000054020
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0xe352a000e3526
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x6a5d30006a5d3
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x2860b00028606
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0xce9ad000ce9ab
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x42d4000042d40
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x27e6900027e69
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x7aaf20007aaef
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0x2b6150002b615
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0x2969200029692
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc801e000c8016
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc72c5000c72bf
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc2018000c2018
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0x6ad840006ad7b
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0x80b4600080b41
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0xd627a000d6276
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0xfafb9000fafb9
| | GetMultiXactIdMembers
| |
| |--0.50%-- 0xc6b3c000c6b38
| | GetMultiXactIdMembers
| |
| |--0.50%-- 0x7207700072072
| | GetMultiXactIdMembers
| --69.92%-- [...]
--90.08%-- [...]

Thanks,
Dong

On Wed, Dec 18, 2013 at 2:17 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Hello,
>
> On 2013-12-18 10:24:56 -0800, Dong Ye wrote:
> > It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit
> > commit.
>
> How long does a run take to verify the problem? Could you retry with the
> patch attached to
>
> http://www.postgresql.org/message-id/20131201114514.GG18793@alap2.anarazel.de
> ? Based on the theory that it creates many superflous multixacts.
>
> > Flat perf profiles of two such runs look like:
>
> Those aren't really flat profiles tho ;)
>
> > 0ac:
> >
> > Samples: 706K of event 'cycles', Event count (approx.): 6690377376522
> >
> >
> > + 3.82% postgres postgres [.]
> > GetMultiXactIdMembers
>
> Could you expland that one some levels, so we see the callers?
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Dong Ye <yed(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 20:03:22
Message-ID: 20131218200322.GD26481@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2013-12-18 14:59:45 -0500, Dong Ye wrote:
> ~20 minutes each run with binary.
> Try your patch now..
> You are right I used -g in perf record. But what I reported was flat (meant
> as a start).
>
> Expand GetMultiXactIdMembers:
>
> 3.82% postgres postgres [.]
> GetMultiXactIdMembers

That looks like a postgres compiled without
-fno-omit-frame-pointer. Without that hierarchical profiles are
meaningless. Very new perfs can also do it using dwarf, but it's unusabl
slow...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Dong Ye <yed(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 20:13:59
Message-ID: CAEHKxOFvXN2mSjj=OXotrdsMXLWH=zoZbsA=RH-1qNSz30oULA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI

On Wed, Dec 18, 2013 at 3:03 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> That looks like a postgres compiled without
> -fno-omit-frame-pointer. Without that hierarchical profiles are
> meaningless. Very new perfs can also do it using dwarf, but it's unusabl
> slow...
>
> Let me complete current test without the flag (to serve as a checkpoint as
well). I will do a run with the flag after.

Thanks,
Dong


From: Dong Ye <yed(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 20:27:08
Message-ID: CAEHKxOHV6V_x+zsq4WspQw99qSDHFrGMe=RjTqTCuaJzDHhXWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Applied your patch (but not using -fno-omit-frame-pointer). It seems
recover the perf loss: 55659.72 notpm.
FWIW, the profile is below. I will do a run with the flag..
Samples: 598K of event 'cycles', Event count (approx.): 6568957160059

+ 4.03% postgres postgres [.]
hash_search_with_hash_value
+ 3.74% postgres postgres [.]
heap_hot_search_buffer
+ 3.17% postgres postgres [.] LWLockAcquire
+ 2.92% postgres postgres [.] _bt_compare
+ 2.64% postgres postgres [.] PinBuffer
+ 2.62% postgres postgres [.] AllocSetAlloc
+ 2.34% postgres postgres [.] XLogInsert
+ 2.27% postgres postgres [.] SearchCatCache
+ 1.81% postgres postgres [.]
HeapTupleSatisfiesMVCC
+ 1.52% postgres postgres [.] ExecInitExpr
+ 1.45% postgres postgres [.] heap_page_prune_opt
+ 1.41% swapper [kernel.kallsyms] [k] intel_idle
+ 1.30% postgres postgres [.] LWLockRelease
+ 1.12% postgres postgres [.] heapgetpage
+ 0.96% postgres libc-2.17.so [.] _int_malloc
+ 0.86% postgres libc-2.17.so [.] __memcpy_ssse3_back
+ 0.84% postgres postgres [.] hash_any
+ 0.81% postgres postgres [.]
MemoryContextAllocZeroAligned
+ 0.76% postgres postgres [.] XidInMVCCSnapshot
+ 0.74% postgres postgres [.] _bt_checkkeys
+ 0.70% postgres postgres [.]
fmgr_info_cxt_security
+ 0.70% postgres postgres [.] FunctionCall2Coll
+ 0.66% postgres libc-2.17.so [.]
__strncpy_sse2_unaligned
+ 0.63% postgres postgres [.] tbm_iterate
+ 0.58% postgres postgres [.] base_yyparse
+ 0.58% postgres libc-2.17.so [.] __printf_fp
+ 0.55% postgres postgres [.] palloc
+ 0.51% postgres postgres [.]
TransactionIdPrecedes
+ 0.51% postgres postgres [.] slot_deform_tuple
+ 0.50% postgres postgres [.] ReadBuffer_common


From: Dong Ye <yed(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 regression with dbt2
Date: 2013-12-18 21:57:40
Message-ID: CAEHKxOFJNufZhh1RKzDCQQNSJtsZoTCB6rnE7+VWvJLvr_=-Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Applying your patch plus adding -fno-omit-frame-pointer, I got 54526.90
notpm.

The profile (part) below:
# Samples: 610K of event 'cycles'
# Event count (approx.): 6686532056450
#
# Overhead Command Shared Object
Symbol
# ........ .......... .............
..................................................
#
4.08% postgres postgres [.] hash_search_with_hash_value

|
--- hash_search_with_hash_value
|
--2.87%-- BufTableLookup
|
--2.86%-- ReadBuffer_common
ReadBufferExtended
|
|--0.86%-- index_fetch_heap
| |
| --0.62%-- index_getnext
| IndexNext
| ExecScan
| ExecProcNode
|
|--0.71%-- BitmapHeapNext
| ExecScan
| ExecProcNode
|
--0.57%-- _bt_relandgetbuf
|
--0.57%-- _bt_search

3.75% postgres postgres [.] heap_hot_search_buffer

|
--- heap_hot_search_buffer
|
|--1.92%-- BitmapHeapNext
| ExecScan
| ExecProcNode
| |
| --1.12%-- standard_ExecutorRun
| _SPI_execute_plan
| SPI_execute_plan
| |
| --0.70%-- payment
| ExecMakeFunctionResult
| ExecProject
| ExecResult
| ExecProcNode
| standard_ExecutorRun
| PortalRunSelect
| PortalRun
| PostgresMain
| ServerLoop
| PostmasterMain
| main
| __libc_start_main
|
--1.74%-- index_fetch_heap
|
--1.46%-- index_getnext
|
--1.45%-- IndexNext
ExecScan
ExecProcNode
|
--0.96%--
standard_ExecutorRun

_SPI_execute_plan

SPI_execute_plan
|
--0.50%--
new_order

ExecMakeFunctionResult

ExecProject

ExecResult

ExecProcNode

standard_ExecutorRun

PortalRunSelect

PortalRunFetch

PerformPortalFetch

standard_ProcessUtility

PortalRunUtility

FillPortalStore

PortalRun

PostgresMain

ServerLoop

PostmasterMain
main

__libc_start_main

3.15% postgres postgres [.] LWLockAcquire

|
--- LWLockAcquire
|
--1.65%-- ReadBuffer_common
ReadBufferExtended

2.74% postgres postgres [.] PinBuffer

|
--- PinBuffer
|
--2.72%-- ReadBuffer_common
ReadBufferExtended
|
|--0.71%-- index_fetch_heap
| |
| --0.51%-- index_getnext
| IndexNext
| ExecScan
| ExecProcNode
|
|--0.67%-- BitmapHeapNext
| ExecScan
| ExecProcNode
|
--0.60%-- heapgetpage
heapgettup_pagemode
heap_getnext
SeqNext
ExecScan
ExecProcNode

2.72% postgres postgres [.] _bt_compare

|
--- _bt_compare
|
--2.51%-- _bt_binsrch
|
|--1.18%-- _bt_search
| |
| --0.74%-- _bt_first
| |
| --0.72%-- btgettuple
|
FunctionCall2Coll
|
index_getnext_tid
| |
| --0.58%--
index_getnext
| |

|
--0.58%-- IndexNext
|
ExecScan
|
ExecProcNode
|
--0.97%-- _bt_first
|
--0.94%-- btgettuple
FunctionCall2Coll
index_getnext_tid
|
--0.66%-- index_getnext
|
--0.66%--
IndexNext

ExecScan

ExecProcNode
|

--0.51%-- standard_ExecutorRun

_SPI_execute_plan

SPI_execute_plan

2.47% postgres postgres [.] AllocSetAlloc

|
--- AllocSetAlloc
|
--0.54%-- MemoryContextAllocZeroAligned