Re: FSM patch - performance test

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: FSM patch - performance test
Date: 2008-09-18 11:42:30
Message-ID: 48D23EA6.3030601@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Heikki,

I finally performed iGen test. I used two v490 servers with 4 dual core SPARC
CPUs and 32GB RAM. I have only one disk and I did not performed any disk I/O
optimization. I tested 105 parallel connection and think time was 200ms.
See the result:

Original:
---------
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1458.76 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 3122.44 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2626.70 tpm

TRANSACTION MIX

Total number of transactions = 438133
TYPE TX. COUNT MIX
---- --------- ---
Light: 72938 16.65%
Medium: 156122 35.63%
DSS: 48516 11.07%
Heavy: 131335 29.98%
Connection: 29222 6.67%

RESPONSE TIMES AVG. MAX. 90TH

Light 0.541 3.692 0.800
Medium 0.542 3.702 0.800
DSS 0.539 3.510 0.040
Heavy 0.539 3.742 4.000
Connections 0.545 3.663 0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 64.851454

New FSM implementation:
-----------------------
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1351.20 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 2888.74 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2428.90 tpm

TRANSACTION MIX

Total number of transactions = 405502
TYPE TX. COUNT MIX
---- --------- ---
Light: 67560 16.66%
Medium: 144437 35.62%
DSS: 45028 11.10%
Heavy: 121445 29.95%
Connection: 27032 6.67%

RESPONSE TIMES AVG. MAX. 90TH

Light 0.596 3.735 0.800
Medium 0.601 3.748 0.800
DSS 0.601 3.695 0.040
Heavy 0.597 3.725 4.000
Connections 0.599 3.445 0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 66.419466

----------------------------

My conclusion is that new implementation is about 8% slower in OLTP workload.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 11:47:19
Message-ID: 48D23FC7.4030309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> My conclusion is that new implementation is about 8% slower in OLTP
> workload.

Thanks. That's very disappointing :-(

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 12:02:07
Message-ID: 48D2433F.20809@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> My conclusion is that new implementation is about 8% slower in OLTP
> workload.

Can you do some analysis of why that is?

Looks like I need to blow the dust off my DBT-2 test rig and try to
reproduce that as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 13:04:43
Message-ID: 48D251EB.6060605@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> My conclusion is that new implementation is about 8% slower in OLTP
>> workload.
>
> Can you do some analysis of why that is?

I'll try something but I do not guarantee result.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 16:19:12
Message-ID: 21446.1221754752@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Zdenek Kotala wrote:
>> My conclusion is that new implementation is about 8% slower in OLTP
>> workload.

> Thanks. That's very disappointing :-(

One thing that jumped out at me is that you call FreeSpaceMapExtendRel
every time a rel is extended by even one block. I admit I've not
studied the data structure in any detail yet, but surely most such calls
end up being a no-op? Seems like some attention to making a fast path
for that case would be helpful.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 18:04:40
Message-ID: 48D29838.9040405@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Zdenek Kotala wrote:
>>> My conclusion is that new implementation is about 8% slower in OLTP
>>> workload.
>
>> Thanks. That's very disappointing :-(
>
> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
> every time a rel is extended by even one block. I admit I've not
> studied the data structure in any detail yet, but surely most such calls
> end up being a no-op? Seems like some attention to making a fast path
> for that case would be helpful.

Yes, most of those calls end up being no-op. Which is exactly why I
would be surprised if those made any difference. It does call
smgrnblocks(), though, which isn't completely free...

Zdenek, can you say off the top of your head whether the test was I/O
bound or CPU bound? What was the CPU utilization % during the test?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 18:26:32
Message-ID: 23843.1221762392@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
>> every time a rel is extended by even one block.

> Yes, most of those calls end up being no-op. Which is exactly why I
> would be surprised if those made any difference. It does call
> smgrnblocks(), though, which isn't completely free...

No, it's a kernel call (at least one) which makes it pretty expensive.

I wonder whether it's necessary to do FreeSpaceMapExtendRel at this
point at all? Why not lazily extend the map when you are told to store
a nonzero space category for a page that's off the end of the map?
Whether or not this saved many cycles overall, it'd push most of the map
extension work to VACUUM instead of having it happen in foreground.

A further refinement would be to extend the map only for a space
category "significantly" greater than zero --- maybe a quarter page or
so. For an insert-only table that would probably result in the map
never growing at all, which might be nice. However it would go back to
the concept of FSM being lossy; I forget whether you were hoping to get
away from that.

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-18 19:05:50
Message-ID: 48D2A68E.2060808@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Tom Lane wrote:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> Zdenek Kotala wrote:
>>>> My conclusion is that new implementation is about 8% slower in OLTP
>>>> workload.
>>
>>> Thanks. That's very disappointing :-(
>>
>> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
>> every time a rel is extended by even one block. I admit I've not
>> studied the data structure in any detail yet, but surely most such calls
>> end up being a no-op? Seems like some attention to making a fast path
>> for that case would be helpful.
>
> Yes, most of those calls end up being no-op. Which is exactly why I
> would be surprised if those made any difference. It does call
> smgrnblocks(), though, which isn't completely free...

It is not a problem. It is really strange. I'm using DTrace to count number of
calls and number of calls is really small (I monitor only one backend). I have
removed WAL logging and it does not help too.

> Zdenek, can you say off the top of your head whether the test was I/O
> bound or CPU bound? What was the CPU utilization % during the test?

CPU is not problem it is mostly in idle time.

-bash-3.00# iostat 5
tty sd1 ssd0 ssd1 nfs1 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
0 1 0 0 1 9 1 92 0 0 0 0 0 0 0 0 0 100
0 47 0 0 0 894 111 7 0 0 0 0 0 0 2 1 0 97
0 16 0 0 0 949 118 6 0 0 0 0 0 0 2 2 0 97
0 16 0 0 0 965 120 6 0 0 0 0 0 0 2 1 0 97
0 16 0 0 0 981 122 7 0 0 0 0 0 0 2 2 0 96
0 16 0 0 0 944 118 6 0 0 0 0 0 0 2 1 0 97
0 16 0 0 0 1202 149 7 0 0 0 0 0 0 3 2 0 95
0 16 0 0 0 1261 157 9 0 0 0 0 0 0 3 2 0 95
0 16 0 0 0 1357 168 14 0 0 0 0 0 0 3 2 0 95
0 16 0 0 0 1631 201 33 0 0 0 0 0 0 2 2 0 96
0 16 0 0 0 1973 246 48 0 0 0 0 0 0 2 2 0 96
0 16 0 0 0 2008 251 50 0 0 0 0 0 0 2 2 0 97
0 16 0 0 0 1956 241 45 0 0 0 0 0 0 2 2 0 97
0 16 0 0 0 2003 250 49 0 0 0 0 0 0 2 2 0 97

-bash-3.00# vmstat 1
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s1 sd sd -- in sy cs us sy id
0 0 0 28091000 31640552 3 4 0 0 0 0 0 0 1 0 0 359 72 206 0 0 100
0 0 0 27363144 27614576 3 28 0 16 16 0 0 0 60 0 0 1216 1134 1072 1 1 99
0 0 0 27363144 27614568 8 0 0 16 16 0 0 0 52 0 0 1099 1029 964 0 1 98
0 0 0 27363144 27614560 9 0 0 8 8 0 0 0 53 0 0 1143 896 1009 1 1 98
0 0 0 27363144 27614544 1 241 0 16 16 0 0 0 46 0 0 1042 1105 895 0 1 98
0 0 0 27363144 27614544 0 0 0 16 16 0 0 0 50 0 0 1078 860 924 0 0 99
0 0 0 27363144 27614552 10 0 0 16 16 0 0 0 56 0 0 1177 914 1033 1 1 98
0 0 0 27363144 27614536 0 0 0 8 8 0 0 0 25 0 0 726 554 603 0 0 99
0 0 0 27363144 27614528 1 0 0 16 16 0 0 0 65 0 0 1206 1159 1081 1 1 98
0 0 0 27363144 27614512 13 0 0 16 16 0 0 0 63 0 0 1256 1088 1094 1 1 99
0 0 0 27363144 27614512 0 0 0 8 8 0 0 0 37 0 0 920 797 779 0 1 99
0 0 0 27363144 27614504 6 0 0 16 16 0 0 0 58 0 0 1218 1074 1078 1 0 99
0 0 0 27363144 27614488 85 91 0 16 16 0 0 0 45 0 0 973 1344 833 1 1 99
0 0 0 27363144 27614488 2 0 0 16 16 0 0 0 57 0 0 1164 1023 1036 1 1 99
0 0 0 27363144 27614472 4 0 0 8 8 0 0 0 47 0 0 1133 937 957 0 1 99

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-19 14:44:04
Message-ID: 48D3BAB4.4080105@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala napsal(a):
> Heikki Linnakangas napsal(a):
>> Zdenek Kotala wrote:
>>> My conclusion is that new implementation is about 8% slower in OLTP
>>> workload.
>>
>> Can you do some analysis of why that is?

I tested it several times and last test was surprise for me. I run original
server (with old FSM) on the database which has been created by new server (with
new FSM) and performance is similar (maybe new implementation is little bit better):

MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm

The question is why? There could be two reasons for that. One is realated to
OS/FS or HW. Filesystem could be defragmented or HDD is slower in some part...

Second idea is that new FSM creates heavy defragmented data and index scan needs
to jump from one page to another too often.

Thoughts?

Zdenek

PS: I'm leaving now and I will be online on Monday.

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-19 20:28:32
Message-ID: 48D40B70.7010002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Zdenek Kotala napsal(a):
>> Heikki Linnakangas napsal(a):
>>> Zdenek Kotala wrote:
>>>> My conclusion is that new implementation is about 8% slower in OLTP
>>>> workload.
>>>
>>> Can you do some analysis of why that is?
>
> I tested it several times and last test was surprise for me. I run
> original server (with old FSM) on the database which has been created by
> new server (with new FSM) and performance is similar (maybe new
> implementation is little bit better):
>
> MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
> MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
> MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm
>
> The question is why? There could be two reasons for that. One is
> realated to OS/FS or HW. Filesystem could be defragmented or HDD is
> slower in some part...

Ugh. Could it be autovacuum kicking in at different times? Do you get
any other metrics than the TPM out of it.

> Second idea is that new FSM creates heavy defragmented data and index
> scan needs to jump from one page to another too often.

Hmm. That's remotely plausible, I suppose. The old FSM only kept track
of pages with more than avg. request size of free space, but the new FSM
tracks even the smallest free spots. Is there tables in that workload
that are inserted to, with very varying row widths?

FWIW, I just got the results of my first 2h DBT-2 results, and I'm
seeing no difference at all in the overall performance or behavior
during the test. Autovacuum doesn't kick in in those short tests,
though, so I schedule a pair of 4h tests, and might run even longer
tests over the weekend.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-19 23:30:21
Message-ID: 27703.1221867021@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Zdenek Kotala wrote:
>> Second idea is that new FSM creates heavy defragmented data and index
>> scan needs to jump from one page to another too often.

> Hmm. That's remotely plausible, I suppose. The old FSM only kept track
> of pages with more than avg. request size of free space, but the new FSM
> tracks even the smallest free spots. Is there tables in that workload
> that are inserted to, with very varying row widths?

I'm not sure I buy that either. But after thinking a bit about how
search_avail() works, it occurs to me that it's not doing what the old
code did and that might contribute to contention. The old FSM did a
cyclic search through the pages it knew about, so as long as there were
plenty of pages with "enough" free space, different backends would
always get pointed to different pages. But consider what the algorithm
is now. (For simplicity, consider only the behavior on a leaf FSM page.)

* Starting from the "next" slot, bubble up to parent nodes until finding
a parent showing enough space.

* Descend to the *leftmost* leaf child of that parent that has enough
space.

* Point "next" to the slot after that, and return that page.

What this means is that if we start with "next" pointing at a page
without enough space (quite likely considering that we now index all
pages not only those with free space), then it is highly possible that
the search will end on a page *before* where next was. The most trivial
case is that we have an even-numbered page with a lot of free space and
its odd-numbered successor has none --- in this case, far from spreading
out the backends, all comers will be handed back that same page! (Until
someone reports that it's full.) In general it seems that this behavior
will tend to concentrate the returned pages in a small area rather than
allowing them to range over the whole FSM page as was intended.

So the bottom line is that the "next" addition doesn't actually work and
needs to be rethought. It might be possible to salvage it by paying
attention to "next" during the descent phase and preferentially trying
to descend to the right of "next"; but I'm not quite sure how to make
that work efficiently, and even less sure how to wrap around cleanly
when the starting value of "next" is near the last slot on the page.

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-22 15:41:46
Message-ID: 48D7BCBA.5030908@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Zdenek Kotala napsal(a):
>>> Heikki Linnakangas napsal(a):
>>>> Zdenek Kotala wrote:
>>>>> My conclusion is that new implementation is about 8% slower in OLTP
>>>>> workload.
>>>>
>>>> Can you do some analysis of why that is?
>>
>> I tested it several times and last test was surprise for me. I run
>> original server (with old FSM) on the database which has been created
>> by new server (with new FSM) and performance is similar (maybe new
>> implementation is little bit better):
>>
>> MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
>> MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
>> MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm
>>
>> The question is why? There could be two reasons for that. One is
>> realated to OS/FS or HW. Filesystem could be defragmented or HDD is
>> slower in some part...
>
> Ugh. Could it be autovacuum kicking in at different times? Do you get
> any other metrics than the TPM out of it.

I don't think that it is autovacuum problem. I run test more times and result
was same. But today I created fresh database and I got similar throughput for
original and new FSM implementation. It seems to me that I hit a HW/OS
singularity. I'll verify it tomorrow.

I recognize only little bit slowdown during index creation, (4:11mins vs.
3:47mins), but I tested it only once.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM patch - performance test
Date: 2008-09-22 18:44:00
Message-ID: 48D7E770.80407@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> What this means is that if we start with "next" pointing at a page
> without enough space (quite likely considering that we now index all
> pages not only those with free space), then it is highly possible that
> the search will end on a page *before* where next was. The most trivial
> case is that we have an even-numbered page with a lot of free space and
> its odd-numbered successor has none --- in this case, far from spreading
> out the backends, all comers will be handed back that same page! (Until
> someone reports that it's full.) In general it seems that this behavior
> will tend to concentrate the returned pages in a small area rather than
> allowing them to range over the whole FSM page as was intended.

Good point.

> So the bottom line is that the "next" addition doesn't actually work and
> needs to be rethought. It might be possible to salvage it by paying
> attention to "next" during the descent phase and preferentially trying
> to descend to the right of "next"; but I'm not quite sure how to make
> that work efficiently, and even less sure how to wrap around cleanly
> when the starting value of "next" is near the last slot on the page.

Yeah, I think it can be salvaged like that. see the patch I just posted
on a separate thread.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com