Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date: 2013-06-10 10:40:04
Message-ID: alpine.DEB.2.02.1306101010220.12980@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Greg,

Thanks for this very detailed review and the suggestions!

I'll submit a new patch

>> Question 1: should it report the maximum lang encountered?
>
> I haven't found the lag measurement to be very useful yet, outside of
> debugging the feature itself. Accordingly I don't see a reason to add even
> more statistics about the number outside of testing the code. I'm seeing
> some weird lag problems that this will be useful for though right now, more
> on that a few places below.

I'll explain below why it is really interesting to get this figure, and
that it is not really available as precisely elsewhere.

>> Question 2: the next step would be to have the current lag shown under
>> option --progress, but that would mean having a combined --throttle
>> --progress patch submission, or maybe dependencies between patches.
>
> This is getting too far ahead.

Ok!

> Let's get the throttle part nailed down before introducing even more
> moving parts into this. I've attached an updated patch that changes a
> few things around already. I'm not done with this yet and it needs some
> more review before commit, but it's not too far away from being ready.

Ok. I'll submit a new version by the end of the week.

> This feature works quite well. On a system that will run at 25K TPS without
> any limit, I did a run with 25 clients and a rate of 400/second, aiming at
> 10,000 TPS, and that's what I got:
>
> number of clients: 25
> number of threads: 1
> duration: 60 s
> number of transactions actually processed: 599620
> average transaction lag: 0.307 ms
> tps = 9954.779317 (including connections establishing)
> tps = 9964.947522 (excluding connections establishing)
>
> I never thought of implementing the throttle like this before,

Stochastic processes are a little bit magic:-)

> but it seems to work out well so far. Check out tps.png to see the
> smoothness of the TPS curve (the graphs came out of pgbench-tools.
> There's a little more play outside of the target than ideal for this
> case. Maybe it's worth tightening the Poisson curve a bit around its
> center?

The point of a Poisson distribution is to model random events the kind of
which are a little bit irregular, such as web requests or queuing clients
at a taxi stop. I cannot really change the formula, but if you want to
argue with Siméon Denis Poisson, hist current address is 19th section of
"Père Lachaise" graveyard in Paris:-)

More seriously, the only parameter that can be changed is the "1000000.0"
which drives the granularity of the Poisson process. A smaller value would
mean a smaller potential multiplier; that is how far from the average time
the schedule can go. This may come under "tightening", although it would
depart from a "perfect" process and possibly may be a little less
"smooth"... for a given definition of "tight", "perfect" and "smooth":-)

> [...] What I did instead was think of this as a transaction rate target,
> which makes the help a whole lot simpler:
>
> -R SPEC, --rate SPEC
> target rate per client in transactions per second

Ok, I'm fine with this name.

> Made the documentation easier to write too. I'm not quite done with that
> yet, the docs wording in this updated patch could still be better.

I'm not an English native speaker, any help is welcome here. I'll do my
best.

> I personally would like this better if --rate specified a *total* rate across
> all clients.

Ok, I can do that, with some reworking so that the stochastic process is
shared by all threads instead of being within each client. This mean that
a lock between threads to access some variables, which should not impact
the test much. Another option is to have a per-thread stochastic process.

> However, there are examples of both types of settings in the
> program already, so there's no one precedent for which is right here. -t is
> per-client and now -R is too; I'd prefer it to be like -T instead. It's not
> that important though, and the code is cleaner as it's written right now.
> Maybe this is better; I'm not sure.

I like the idea of just one process instead of a per-client one. I did not
try at the beginning because the implementation is less straightforward.

> On the topic of this weird latency spike issue, I did see that show up in
> some of the results too.

Your example illustrates *exactly* why the lag measure was added.

The Poisson processes generate an ideal event line (that is irregularly
scheduled transaction start times targetting the expected tps) which
induces a varrying load that the database is trying to handle.

If it cannot start right away, this means that some transactions are
differed with respect to their schedule start time. The measure latency
reports exactly that: the clients do not handle the load. There may be
some catchup later, that is the clients come back in line with the
scheduled transactions.

I need to put this measure here because the "schedluled time" is only
known to pgbench and not available elsewhere. The max would really be more
interesting than the mean, so as to catch that some things were
temporarily amiss, even if things went back to nominal later.

> Here's one where I tried to specify a rate higher
> than the system can actually handle, 80000 TPS total on a SELECT-only test
>
> $ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
> starting vacuum...end.
> transaction type: SELECT only
> scaling factor: 100
> query mode: simple
> number of clients: 8
> number of threads: 4
> duration: 30 s
> number of transactions actually processed: 761779
> average transaction lag: 10298.380 ms

The interpretation is the following: as the database cannot handle the
load, transactions were processed on average 10 seconds behind their
scheduled transaction time. You had on average a 10 second latency to
answer "incoming" requests. Also some transactions where implicitely not
even scheduled, so the situation is worse than that...

> tps = 25392.312544 (including connections establishing)
> tps = 25397.294583 (excluding connections establishing)
>
> It was actually limited by the capabilities of the hardware, 25K TPS. 10298
> ms of lag per transaction can't be right though.
>
> Some general patch submission suggestions for you as a new contributor:

Hmmm, I did a few things such as "pgxs" back in 2004, so maybe "not very
active" is a better description than "new":-)

> -When re-submitting something with improvements, it's a good idea to add a
> version number to the patch so reviewers can tell them apart easily. But
> there is no reason to change the subject line of the e-mail each time. I
> followed that standard here. If you updated this again I would name the file
> pgbench-throttle-v9.patch but keep the same e-mail subject.

Ok.

> -There were some extra carriage return characters in your last submission.
> Wasn't a problem this time, but if you can get rid of those that makes for a
> better patch.

Ok.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2013-06-10 10:51:29 Improvement of checkpoint IO scheduler for stable transaction responses
Previous Message Greg Stark 2013-06-10 09:40:19 Re: Placing hints in line pointers