Re: gaussian distribution pgbench

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Peter Geoghegan <pg(at)heroku(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2013-12-20 01:23:25
Message-ID: 52B39C0D.2020606@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/13 5:52 PM, Gavin Flower wrote:
> Curious, wouldn't the common usage pattern tend to favour a skewed
> distribution, such as the Poisson Distribution (it has been over 40
> years since I studied this area, so there may be better candidates).
>

Some people like database load testing with a "Pareto principle"
distribution, where 80% of the activity hammers 20% of the rows such
that locking becomes important. (That's one specific form of Pareto
distribution) The standard pgbench load indirectly gets you quite a bit
of that due to all the contention on the branches table. Targeting all
of that at a single table can be more realistic.

My last round of reviewing a pgbench change left me pretty worn out with
wanting to extend that code much further. Adding in some new
probability distributions would be fine though, that's a narrow change.
We shouldn't get too excited about pgbench remaining a great tool for
too much longer though. pgbench is fast approaching a wall nowadays,
where it's hard for any single client server to fully overload today's
larger server. You basically need a second large server to generate
load, whereas what people really want is a bunch of coordinated small
clients. (That sort of wall was in early versions too, it just got
pushed upward a lot by the multi-worker changes in 9.0 coming around the
same time desktop core counts really skyrocketed)

pgbench started as a clone of a now abandoned Java project called
JDBCBench. I've been seriously considering a move back toward that
direction lately. Nowadays spinning up ten machines to run load
generation is trivial. The idea of extending pgbench's C code to
support multiple clients running at the same time and collating all of
their results is not a project I'd be excited about. It should remain a
perfectly fine tool for PostgreSQL developers to find code hotspots, but
that's only so useful.

(At this point someone normally points out Tsung solved all of those
problems years ago if you'd only give it a chance. I think it's kind of
telling that work on sysbench is rewriting the whole thing so you can
use Lua for your test scripts.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2013-12-20 02:09:47 Re: preserving forensic information when we freeze
Previous Message Florian Pflug 2013-12-19 23:40:11 XML Issue with DTDs