Re: Do you know the reason for increased max latency due to xlog scaling?

Lists: pgsql-hackers
From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-17 15:43:54
Message-ID: 14B45CC96F564D2FA369A4C6D7FB64D9@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Heikki san,

I'm excited about your great work, xlog scaling. I'm looking forward to the
release of 9.4.

Please let me ask you about your performance data on the page:

http://hlinnaka.iki.fi/xloginsert-scaling/padding/

I'm worried about the big increase in max latency. Do you know the cause?
More frequent checkpoints caused by increased WAL volume thanks to enhanced
performance?

Although I'm not sure this is related to what I'm asking, the following code
fragment in WALInsertSlotAcquireOne() catched my eyes. Shouldn't the if
condition be "slotno == -1" instead of "!="? I thought this part wants to
make inserters to use another slot on the next insertion, when they fail to
acquire the slot immediately. Inserters pass slotno == -1. I'm sorry if I
misunderstood the code.

/*
* If we couldn't get the slot immediately, try another slot next time.
* On a system with more insertion slots than concurrent inserters, this
* causes all the inserters to eventually migrate to a slot that no-one
* else is using. On a system with more inserters than slots, it still
* causes the inserters to be distributed quite evenly across the slots.
*/
if (slotno != -1 && retry)
slotToTry = (slotToTry + 1) % num_xloginsert_slots;

Regards
MauMau


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-17 15:58:54
Message-ID: 20140217155854.GE18388@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-02-18 00:43:54 +0900, MauMau wrote:
> Please let me ask you about your performance data on the page:
>
> http://hlinnaka.iki.fi/xloginsert-scaling/padding/
>
> I'm worried about the big increase in max latency. Do you know the cause?
> More frequent checkpoints caused by increased WAL volume thanks to enhanced
> performance?

I don't see much evidence of increased latency there? You can't really
compare the latency when the throughput is significantly different.

> Although I'm not sure this is related to what I'm asking, the following code
> fragment in WALInsertSlotAcquireOne() catched my eyes. Shouldn't the if
> condition be "slotno == -1" instead of "!="? I thought this part wants to
> make inserters to use another slot on the next insertion, when they fail to
> acquire the slot immediately. Inserters pass slotno == -1. I'm sorry if I
> misunderstood the code.

I think you're right.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Andres Freund" <andres(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-17 16:35:52
Message-ID: 75E03D4179B24D749DB22D6E5FEBD07C@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> On 2014-02-18 00:43:54 +0900, MauMau wrote:
>> I'm worried about the big increase in max latency. Do you know the
>> cause?
>> More frequent checkpoints caused by increased WAL volume thanks to
>> enhanced
>> performance?
>
> I don't see much evidence of increased latency there? You can't really
> compare the latency when the throughput is significantly different.

For example, please see the max latencies of test set 2 (PG 9.3) and test
set 4 (xlog scaling with padding). They are 207.359 and 1219.422
respectively. The throughput is of course greatly improved, but I think the
response time should not be sacrificed as much as possible. There are some
users who are sensitive to max latency, such as stock exchange and online
games.

>> Although I'm not sure this is related to what I'm asking, the following
>> code
>> fragment in WALInsertSlotAcquireOne() catched my eyes. Shouldn't the if
>> condition be "slotno == -1" instead of "!="? I thought this part wants
>> to
>> make inserters to use another slot on the next insertion, when they fail
>> to
>> acquire the slot immediately. Inserters pass slotno == -1. I'm sorry if
>> I
>> misunderstood the code.
>
> I think you're right.

Thanks for your confirmation. I'd be glad if the fix could bring any
positive impact on max latency.

Regards
MauMau


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-17 16:46:12
Message-ID: 20140217164612.GH18388@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-02-18 01:35:52 +0900, MauMau wrote:
> From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> >On 2014-02-18 00:43:54 +0900, MauMau wrote:
> >>I'm worried about the big increase in max latency. Do you know the
> >>cause?
> >>More frequent checkpoints caused by increased WAL volume thanks to
> >>enhanced
> >>performance?
> >
> >I don't see much evidence of increased latency there? You can't really
> >compare the latency when the throughput is significantly different.
>
> For example, please see the max latencies of test set 2 (PG 9.3) and test
> set 4 (xlog scaling with padding). They are 207.359 and 1219.422
> respectively. The throughput is of course greatly improved, but I think the
> response time should not be sacrificed as much as possible. There are some
> users who are sensitive to max latency, such as stock exchange and online
> games.

You need to compare both at the same throughput to have any meaningful
comparison.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Andres Freund" <andres(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 11:49:06
Message-ID: 0BDD0493C4A141F597ED6C1AF1E792E5@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> On 2014-02-18 01:35:52 +0900, MauMau wrote:
>> For example, please see the max latencies of test set 2 (PG 9.3) and test
>> set 4 (xlog scaling with padding). They are 207.359 and 1219.422
>> respectively. The throughput is of course greatly improved, but I think
>> the
>> response time should not be sacrificed as much as possible. There are
>> some
>> users who are sensitive to max latency, such as stock exchange and online
>> games.
>
> You need to compare both at the same throughput to have any meaningful
> comparison.

I'm sorry for my lack of understanding, but could you tell me why you think
so? When the user upgrades to 9.4 and runs the same workload, he would
experience vastly increased max latency --- or in other words, greater
variance in response times. With my simple understanding, that sounds like
a problem for response-sensitive users.

Regards
MauMau


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 12:00:58
Message-ID: 20140218120058.GA16471@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-02-18 20:49:06 +0900, MauMau wrote:
> From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> >On 2014-02-18 01:35:52 +0900, MauMau wrote:
> >>For example, please see the max latencies of test set 2 (PG 9.3) and test
> >>set 4 (xlog scaling with padding). They are 207.359 and 1219.422
> >>respectively. The throughput is of course greatly improved, but I think
> >>the
> >>response time should not be sacrificed as much as possible. There are
> >>some
> >>users who are sensitive to max latency, such as stock exchange and online
> >>games.
> >
> >You need to compare both at the same throughput to have any meaningful
> >comparison.
>
> I'm sorry for my lack of understanding, but could you tell me why you think
> so? When the user upgrades to 9.4 and runs the same workload, he would
> experience vastly increased max latency --- or in other words, greater
> variance in response times.

No, the existing data indicates no such thing. When they upgrade they
will have the *same* throughput as before. The datapoints you indicate
that there's an increase in latency, but it's there while processing
several time as much data! The highest throughput of set 2 is 3223,
while the highest for set 4 is 14145.
To get interesting latency comparison you'd need to use pgbench --rate
and use a rate *both* versions can sustain.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 16:27:08
Message-ID: CAMkU=1znu42ODAFMkjdmJxgEM7_Ggy1kWKgS2Tr6T_UhSVeq=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 18, 2014 at 3:49 AM, MauMau <maumau307(at)gmail(dot)com> wrote:

> From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
>
>> On 2014-02-18 01:35:52 +0900, MauMau wrote:
>>
>>> For example, please see the max latencies of test set 2 (PG 9.3) and test
>>> set 4 (xlog scaling with padding). They are 207.359 and 1219.422
>>> respectively. The throughput is of course greatly improved, but I think
>>> the
>>> response time should not be sacrificed as much as possible. There are
>>> some
>>> users who are sensitive to max latency, such as stock exchange and online
>>> games.
>>>
>>
>> You need to compare both at the same throughput to have any meaningful
>> comparison.
>>
>
> I'm sorry for my lack of understanding, but could you tell me why you
> think so? When the user upgrades to 9.4 and runs the same workload, he
> would experience vastly increased max latency

The tests shown have not tested that. The test is not running the same
workload on 9.4, but rather a vastly higher workload. If we were to
throttle the workload in 9.4 (using pgbench's new -R, for example) to the
same level it was in 9.3, we probably would not see the max latency
increase. But that was not tested, so we don't know for sure.

> --- or in other words, greater variance in response times. With my simple
> understanding, that sounds like a problem for response-sensitive users.
>

If you need the throughput provided by 9.4, then using 9.3 gets lower
variance simply be refusing to do 80% of the assigned work. If you don't
need the throughput provided by 9.4, then you probably have some natural
throttling in place.

If you want a real-world like test, you might try to crank up the -c and -j
to the limit in 9.3 in a vain effort to match 9.4's performance, and see
what that does to max latency. (After all, that is what a naive web app is
likely to do--continue to make more and more connections as requests come
in faster than they can finish.)

Cheers,

Jeff


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 17:12:32
Message-ID: 53039480.7060401@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/18/2014 06:27 PM, Jeff Janes wrote:
> On Tue, Feb 18, 2014 at 3:49 AM, MauMau <maumau307(at)gmail(dot)com> wrote:
>
>> --- or in other words, greater variance in response times. With my simple
>> understanding, that sounds like a problem for response-sensitive users.
>
> If you need the throughput provided by 9.4, then using 9.3 gets lower
> variance simply be refusing to do 80% of the assigned work. If you don't
> need the throughput provided by 9.4, then you probably have some natural
> throttling in place.
>
> If you want a real-world like test, you might try to crank up the -c and -j
> to the limit in 9.3 in a vain effort to match 9.4's performance, and see
> what that does to max latency. (After all, that is what a naive web app is
> likely to do--continue to make more and more connections as requests come
> in faster than they can finish.)

You're missing MauMau's point. In essence, he's comparing two systems
with the same number of clients, issuing queries as fast as they can,
and one can do 2000 TPS while the other one can do 10000 TPS. You would
expect the lower-throughput system to have a *higher* average latency.
Each query takes longer, that's why the throughput is lower. If you look
at the avg_latency columns in the graphs
(http://hlinnaka.iki.fi/xloginsert-scaling/padding/), that's exactly
what you see.

But what MauMau is pointing out is that the *max* latency is much higher
in the system that can do 10000 TPS. So some queries are taking much
longer, even though in average the latency is lower. In an ideal,
totally fair system, each query would take the same amount of time to
execute, and after it's saturated, increasing the number of clients just
makes that constant latency higher.

Yeah, I'm pretty sure that's because of the extra checkpoints. If you
look at the individual test graphs, there are clear spikes in latency,
but the latency is otherwise small. With a higher TPS, you reach
checkpoint_segments quicker; I should've eliminated that effect in the
tests I ran...

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 20:51:03
Message-ID: 20140218205103.GB24560@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-02-18 19:12:32 +0200, Heikki Linnakangas wrote:
> You're missing MauMau's point. In essence, he's comparing two systems with
> the same number of clients, issuing queries as fast as they can, and one can
> do 2000 TPS while the other one can do 10000 TPS. You would expect the
> lower-throughput system to have a *higher* average latency.
> Each query takes
> longer, that's why the throughput is lower. If you look at the avg_latency
> columns in the graphs (http://hlinnaka.iki.fi/xloginsert-scaling/padding/),
> that's exactly what you see.
>
> But what MauMau is pointing out is that the *max* latency is much higher in
> the system that can do 10000 TPS. So some queries are taking much longer,
> even though in average the latency is lower. In an ideal, totally fair
> system, each query would take the same amount of time to execute, and after
> it's saturated, increasing the number of clients just makes that constant
> latency higher.

Consider me being enthusiastically unenthusiastic about that fact. The
change in throughput still makes this pretty uninteresting. There's so
many things that are influenced by a factor 5 increase in throughput,
that a change in max latency is really not saying much.
There's also the point that with 5 times the throughput it's getting
more likely to sleep while holding critical locks and such.

> Yeah, I'm pretty sure that's because of the extra checkpoints. If you look
> at the individual test graphs, there are clear spikes in latency, but the
> latency is otherwise small. With a higher TPS, you reach checkpoint_segments
> quicker; I should've eliminated that effect in the tests I ran...

I don't think that'd be a good idea. The number of full page writes so
greatly influences the WAL charactersistics, that changing checkpoint
segments would make the tests much harder to compare.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 21:01:08
Message-ID: 5303CA14.8070806@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/18/2014 10:51 PM, Andres Freund wrote:
> On 2014-02-18 19:12:32 +0200, Heikki Linnakangas wrote:
>> Yeah, I'm pretty sure that's because of the extra checkpoints. If you look
>> at the individual test graphs, there are clear spikes in latency, but the
>> latency is otherwise small. With a higher TPS, you reach checkpoint_segments
>> quicker; I should've eliminated that effect in the tests I ran...
>
> I don't think that'd be a good idea. The number of full page writes so
> greatly influences the WAL charactersistics, that changing checkpoint
> segments would make the tests much harder to compare.

I was just thinking of bumping up checkpoint_segments so high that there
are no checkpoints during any of the tests.

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 21:05:04
Message-ID: 20140218210504.GC28858@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-02-18 23:01:08 +0200, Heikki Linnakangas wrote:
> On 02/18/2014 10:51 PM, Andres Freund wrote:
> >On 2014-02-18 19:12:32 +0200, Heikki Linnakangas wrote:
> >>Yeah, I'm pretty sure that's because of the extra checkpoints. If you look
> >>at the individual test graphs, there are clear spikes in latency, but the
> >>latency is otherwise small. With a higher TPS, you reach checkpoint_segments
> >>quicker; I should've eliminated that effect in the tests I ran...
> >
> >I don't think that'd be a good idea. The number of full page writes so
> >greatly influences the WAL charactersistics, that changing checkpoint
> >segments would make the tests much harder to compare.
>
> I was just thinking of bumping up checkpoint_segments so high that there are
> no checkpoints during any of the tests.

Hm. I actually think that full page writes are an interesting part of
this because they are so significantly differently sized than normal
records.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-18 21:30:05
Message-ID: CAMkU=1xc3bbPgffP1BPGNv9HGSL-HZdmJLCXqgqh7_J7BQd7kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 18, 2014 at 9:12 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 02/18/2014 06:27 PM, Jeff Janes wrote:
>
>> On Tue, Feb 18, 2014 at 3:49 AM, MauMau <maumau307(at)gmail(dot)com> wrote:
>>
>> --- or in other words, greater variance in response times. With my
>>> simple
>>> understanding, that sounds like a problem for response-sensitive users.
>>>
>>
>> If you need the throughput provided by 9.4, then using 9.3 gets lower
>> variance simply be refusing to do 80% of the assigned work. If you don't
>> need the throughput provided by 9.4, then you probably have some natural
>> throttling in place.
>>
>> If you want a real-world like test, you might try to crank up the -c and
>> -j
>> to the limit in 9.3 in a vain effort to match 9.4's performance, and see
>> what that does to max latency. (After all, that is what a naive web app
>> is
>> likely to do--continue to make more and more connections as requests come
>> in faster than they can finish.)
>>
>
> You're missing MauMau's point. In essence, he's comparing two systems with
> the same number of clients, issuing queries as fast as they can, and one
> can do 2000 TPS while the other one can do 10000 TPS. You would expect the
> lower-throughput system to have a *higher* average latency. Each query
> takes longer, that's why the throughput is lower. If you look at the
> avg_latency columns in the graphs (http://hlinnaka.iki.fi/
> xloginsert-scaling/padding/), that's exactly what you see.
>
> But what MauMau is pointing out is that the *max* latency is much higher
> in the system that can do 10000 TPS. So some queries are taking much
> longer, even though in average the latency is lower. In an ideal, totally
> fair system, each query would take the same amount of time to execute, and
> after it's saturated, increasing the number of clients just makes that
> constant latency higher.
>

I thought that this was the point I was making, not the point I was
missing. You have the same hard drives you had before, but now due to a
software improvement you are cramming 5 times more stuff through them.
Yeah, you will get bigger latency spikes. Why wouldn't you? You are now
beating the snot out of your hard drives, whereas before you were not.

If you need 10,000 TPS, then you need to upgrade to 9.4. If you need it
with low maximum latency as well, then you probably need to get better IO
hardware as well (maybe not--maybe more tuning could help). With 9.3 you
didn't need better IO hardware, because you weren't capable of maxing out
what you already had. With 9.4 you can max it out, and this is a good
thing.

If you need 10,000 TPS but only 2000 TPS are completing under 9.3, then
what is happening to the other 8000 TPS? Whatever is happening to them, it
must be worse than a latency spike.

On the other hand, if you don't need 10,000 TPS, than measuring max latency
at 10,000 TPS is the wrong thing to measure.

Cheers,

Jeff


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Cc: "Andres Freund" <andres(at)2ndquadrant(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do you know the reason for increased max latency due to xlog scaling?
Date: 2014-02-19 14:43:21
Message-ID: FAE78AEDDE2F45E0863E8ACAB3CF1AAD@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>
> I thought that this was the point I was making, not the point I was
> missing. You have the same hard drives you had before, but now due to a
> software improvement you are cramming 5 times more stuff through them.
> Yeah, you will get bigger latency spikes. Why wouldn't you? You are now
> beating the snot out of your hard drives, whereas before you were not.
>
> If you need 10,000 TPS, then you need to upgrade to 9.4. If you need it
> with low maximum latency as well, then you probably need to get better IO
> hardware as well (maybe not--maybe more tuning could help). With 9.3 you
> didn't need better IO hardware, because you weren't capable of maxing out
> what you already had. With 9.4 you can max it out, and this is a good
> thing.
>
> If you need 10,000 TPS but only 2000 TPS are completing under 9.3, then
> what is happening to the other 8000 TPS? Whatever is happening to them, it
> must be worse than a latency spike.
>
> On the other hand, if you don't need 10,000 TPS, than measuring max
> latency
> at 10,000 TPS is the wrong thing to measure.

Thank you, I've probably got the point --- you mean the hard disk for WAL is
the bottleneck. But I still wonder a bit why the latency spike became so
bigger even with # of clients fewer than # of CPU cores. I suppose the
requests get processed more smoothly when the number of simultaneous
requests is small. Anyway, I want to believe the latency spike would become
significantly smaller on an SSD.

Regards
MauMau