Re: pgbench--new transaction type

Lists: pgsql-hackers
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pgbench--new transaction type
Date: 2011-05-29 19:11:23
Message-ID: BANLkTi=QBYOM+zj=ReQeiEKDyVpKUtHm6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If you use "pgbench -S -M prepared" at a scale where all data fits in
memory, most of what you are benchmarking is network/IPC chatter, and
table locking. Which is fine if that is what you want to do. This
patch adds a new transaction type of -P, which does the same thing as
-S but it moves the main loop of selects, 10,000 at at time, into
pl/pgSQL. This does a good job of exercising the executor rather than
IPC.

This can simulate workloads that have primary key look ups as the
inner side of large nested loop. It is also useful for isolating and
profiling parts of the backend code.

I did not implement this as a new query mode (-M plpgsql), because the
lack of transaction control in pl/pgSQL means it can only be used for
select-only transactions rather than as a general method. So I
thought a new transaction type made more sense.

I didn't implement it as a custom file using -f because:
1) It seems to be a natural extension of the existing built-ins. Also
-f is fiddly. Several times I've wanted to ask posters who are
discussing the other built in transactions to run something like this
and report back, which is easier to do if it is also builtin.
2) It uses a initialization code which -f does not support.
3) I don't see how I can make it automatically detect and respond to
:scale if it were run under -f.

Perhaps issues 2 and 3 would be best addressed by extending the
general -f facility, but that would be a lot more work, and I don't
know how well received it would be.

The reporting might be an issue. I don't want to call it TPS when it
is really not a transaction being reported, so for now I've just left
the TPS as as true transactions, and added a separate reporting line
for selects per second.

I know I also need to add to the web-docs, but I'm hoping to wait on
that until I get some feedback on whether the whole approach is
considered to be viable or not.

some numbers for single client runs on 64-bit AMD Opteron Linux:

12,567 sps under -S
19,646 sps under -S -M prepared
58,165 sps under -P

Cheers,

Jeff

Attachment Content-Type Size
pgbench_v1.patch text/x-patch 4.2 KB

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-05-30 02:04:20
Message-ID: 4DE2FB24.5020507@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/29/2011 03:11 PM, Jeff Janes wrote:
> If you use "pgbench -S -M prepared" at a scale where all data fits in
> memory, most of what you are benchmarking is network/IPC chatter, and
> table locking.

If you profile it, you'll find a large amount of the time is actually
spent doing more mundane things, like memory allocation. The network
and locking issues are really not the bottleneck at all in a surprising
number of these cases. Your patch isn't really dependent on your being
right about the cause here, which means this doesn't impact your
submissions any. Just wanted to clarify that what people expect are
slowing things down in this situation and what actually shows up when
you profile are usually quite different.

I'm not sure whether this feature makes sense to add to pgbench, but
it's interesting to have it around for developer testing. The way
you've built this isn't messing with the code too much to accomplish
that, and your comments about it being hard to do this using "-f" are
all correct. Using a custom test file aims to shoot your foot unless
you apply a strong grip toward doing otherwise.

> some numbers for single client runs on 64-bit AMD Opteron Linux:
> 12,567 sps under -S
> 19,646 sps under -S -M prepared
> 58,165 sps under -P
>

10,000 is too big of a burst to run at once. The specific thing I'm
concerned about is what happens if you try this mode when using "-T" to
enforce a runtime limit, and your SELECT rate isn't high. If you're
only doing 100 SELECTs/second because your scale is big enough to be
seek bound, you could overrun by nearly two minutes.

I think this is just a matter of turning the optimization around a bit.
Rather than starting with a large batch size and presuming that's ideal,
in this case a different approach is really needed. You want the
smallest batch size that gives you a large win here. My guess is that
most of the gain here comes from increasing batch size to something in
the 10 to 100 range; jumping to 10K is probably overkill. Could you try
some smaller numbers and see where the big increases start falling off at?

Obligatory code formatting nitpick: try not to overrun the right margin
any further than the existing code around line 1779, where you add more
ttype comments. That needs to turn into a multi-line comment. Rest of
the patch looks fine, and don't worry about resubmitting for that; just
something to tweak on your next update. A slightly more descriptive
filename for the patch would help out those of us who look at a lot of
pgbench patches, too. Something like "pgbench_loop_v1.patch" for
example would make it easier for me to remember which patch this was by
its name.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-11 19:21:46
Message-ID: BANLkTi=adoTxfqyb6Kh2p64Zyyoz4dZv+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 29, 2011 at 7:04 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 05/29/2011 03:11 PM, Jeff Janes wrote:
>>
>> If you use "pgbench -S -M prepared" at a scale where all data fits in
>> memory, most of what you are benchmarking is network/IPC chatter, and
>> table locking.
>
> If you profile it, you'll find a large amount of the time is actually spent
> doing more mundane things, like memory allocation.  The network and locking
> issues are really not the bottleneck at all in a surprising number of these
> cases.

I wouldn't expect IPC chatter to show up in profiling, because it
costs wall time, but not CPU time. The time spent might be attributed
to the kernel, or to pgbench, or to nothing at all.

As part of the "Eviscerating the parser" discussion, I made a hack
that had exec_simple_query do nothing but return a dummy
completionTag. So there was no parsing, planning, or execution.
Under this mode, I got 44758 TPS, or 22.3 microsecons/transaction,
which should represent the cost of IPC chatter and pgbench overhead.

The breakdown I get, in microseconds per item, are:

53.70 cost of a select and related overhead via -S -M prepared. of which:
22.34 cost of IPC and pgbench roundtrip, estimated via above discussion
16.91 cost of the actual execution of the select statement, estimated
via the newly suggested -P mode.
--------
14.45 residual usec cost, covering table locking, transaction begin
and end, plus measurement errors.

Because all my tests were single-client, the cost of locking would be
much lower than they would be in contended cases. However, I wouldn't
trust profiling to accurate reflect the locking time anyway, for the
same reason I don't trust it for IPC chatter--wall time is consumed
but not spent on the CPU, so is not counted by profiling.

As you note memory allocation consumes much profiling time. However,
memory allocation is a low level operation which is always in support
of some higher purpose, such as parsing, execution, or taking locks.
My top down approach attempts to assign these bottom-level costs to
the proper higher level purpose.

> Your patch isn't really dependent on your being right about the
> cause here, which means this doesn't impact your submissions any.  Just
> wanted to clarify that what people expect are slowing things down in this
> situation and what actually shows up when you profile are usually quite
> different.
>
> I'm not sure whether this feature makes sense to add to pgbench, but it's
> interesting to have it around for developer testing.

Yes, this is what I thought the opinion might be. But there is no
repository of such "useful for developer testing" patches, other than
this mailing list. That being the case, would it even be worthwhile
to fix it up more and submit it to commitfest?

>> some numbers for single client runs on 64-bit AMD Opteron Linux:
>> 12,567 sps  under -S
>> 19,646 sps  under -S -M prepared
>> 58,165 sps  under -P
>>
>
> 10,000 is too big of a burst to run at once.  The specific thing I'm
> concerned about is what happens if you try this mode when using "-T" to
> enforce a runtime limit, and your SELECT rate isn't high.  If you're only
> doing 100 SELECTs/second because your scale is big enough to be seek bound,
> you could overrun by nearly two minutes.

OK. I wouldn't expect someone to want to use -P under scales that
cause that to happen, but perhaps it should deal with it more
gracefully. I picked 10,000 just because it obviously large enough
for any hardware I have, or will have for the foreseeable future.

> I think this is just a matter of turning the optimization around a bit.
>  Rather than starting with a large batch size and presuming that's ideal, in
> this case a different approach is really needed.  You want the smallest
> batch size that gives you a large win here.  My guess is that most of the
> gain here comes from increasing batch size to something in the 10 to 100
> range; jumping to 10K is probably overkill.  Could you try some smaller
> numbers and see where the big increases start falling off at?

I've now used a variety of sizes (powers of 2 up to 8192, plus 10000);
and the results fit very well to a linear equation, with 17.3
usec/inner select plus 59.0 usec/outer invocation.

So at a loop of 512, you would have overhead of 59.0/512=0.115 out of
total time of 17.4, or 0.7% overhead. So that should be large enough.

Thanks for the other suggestions.

Cheers,

Jeff


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-13 04:38:52
Message-ID: 4DF5945C.9010904@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/11/2011 03:21 PM, Jeff Janes wrote:
> I wouldn't expect IPC chatter to show up in profiling, because it
> costs wall time, but not CPU time. The time spent might be attributed
> to the kernel, or to pgbench, or to nothing at all.
>

Profilers aren't necessarily just accumulating raw CPU time though. If
the approach includes sampling "what code is active right now?"
periodically, you might be able to separate this out even though it's
not using CPU time in the normal fashion. I think you might just need
to use a better profiler.

Anyway, the sort of breakdown this helps produce is valuable
regardless. I highlighted the statement you made because I reflexively
challenge theorizing about code hotspots on general principle. The
measurement-based breakdown you provided was more what I look for.

> But there is no
> repository of such "useful for developer testing" patches, other than
> this mailing list. That being the case, would it even be worthwhile
> to fix it up more and submit it to commitfest?
>

The activity around profiling pgbench and trying to crack one of the
bottlenecks has been picking up a lot of momentum lately, and if we're
lucky that will continue throughout 9.2 development. As such, now seems
a good time as any to consider adding something like this. We may end
up reskinng lots of pgbench before this is over. I added your patch to
the CommitFest.

> So at a loop of 512, you would have overhead of 59.0/512=0.115 out of
> total time of 17.4, or 0.7% overhead. So that should be large enough.
>

That I think is workable. If the split was a compile time constant
fixed at 512 unless you specifically changed it, even the worst typical
cases shouldn't suffer much from batch overhang. If you create a
database so large that you only get 50 TPS, which is unusual but not
that rare, having a 512 execution batch size would mean you might get
your "-T" set end time lagging 10 seconds behind its original plan.
Unlike the 10K you started with, that is reasonable; that does sound
like the sweet spot where overhead is low but time overrun isn't too
terrible.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-13 08:48:39
Message-ID: 4DF5CEE7.3030500@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13/06/11 06:38, Greg Smith wrote:
> On 06/11/2011 03:21 PM, Jeff Janes wrote:
>> I wouldn't expect IPC chatter to show up in profiling, because it
>> costs wall time, but not CPU time. The time spent might be attributed
>> to the kernel, or to pgbench, or to nothing at all.
>>
>
> Profilers aren't necessarily just accumulating raw CPU time though. If
> the approach includes sampling "what code is active right now?"
> periodically, you might be able to separate this out even though it's
> not using CPU time in the normal fashion. I think you might just need
> to use a better profiler.

I got surprisingly insightful results in the past using
http://poormansprofiler.org/

I never used it with Postgres, but it might be worth to try.

J


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-19 22:30:39
Message-ID: 4DFE788F.5020704@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I applied Jeff's patch but changed this to address concerns about the
program getting stuck running for too long in the function:

#define plpgsql_loops 512

This would be better named as "plpgsql_batch_size" or something similar
instead, the current name suggests it's how many loops to run which is
confusing.

My main performance concern here was whether this change really matter
so much once a larger number of clients were involved. Some of the
other things you can do to optimize single-client performance aren't as
useful with lots of them. Here's how the improvements in this mode
worked for me on a server with 4 Hyper-Threaded cores (i870);
shared_buffers=256MB, scale=100:

1 client:
-S: 11533
-S -M prepared: 19498
-P: 49547

12 clients, 4 workers:
-S: 56052
-S -M prepared: 82043
-P: 159443

96 clients, 8 workers:
-S: 49940
-S -M prepared: 76099
-P: 137942

I think this is a really nice new workload to demonstrate. One of the
things we tell people is that code works much faster when moved
server-side, but how much faster isn't always easy to show. Having this
mode available lets them see how dramatic that can be quite easily. I
know I'd like to be able to run performance tests for clients of new
hardware using PostgreSQL and tell them something like this: "With
simple clients executing a statement at a time, this server reaches 56K
SELECTs/section. But using server-side functions to execute them in
larger batches it can do 159K".

The value this provides for providing an alternate source for benchmark
load generation, with a very different profile for how it exercises the
server, is good too.

Things to fix in the patch before it would be a commit candidate:

-Adjust the loop size/name, per above
-Reformat some of the longer lines to try and respect the implied right
margin in the code formatting
-Don't include the "plgsql function created." line unless in debugging mode.
-Add the docs. Focus on how this measures how fast the database can
execute SELECT statements using server-side code. An explanation that
the "transaction" block size is 512 is important to share. It also
needs a warning that time based runs ("-T") may have to wait for a block
to finish and go beyond its normally expected end time.
-The word "via" in the "transaction type" output description is probably
not the best choice. Changing to "SELECT only using PL/pgSQL" would
translate better, and follow the standard case use for the name of that
language.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-20 04:34:18
Message-ID: BANLkTinfnZnyjxv73VKAwQNT3YXFmpby2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 20, 2011 at 07:30, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I applied Jeff's patch but changed this to address concerns about the
> program getting stuck running for too long in the function:
>
> #define plpgsql_loops   512

Is it OK to define the value as constant?

Also, it executes much more queries if -t option (transactions) specified;
Of course it runs the specified number of "transactions", but actually
runs plpgsql_loops times than other modes.

> I think this is a really nice new workload to demonstrate.  One of the
> things we tell people is that code works much faster when moved server-side,

What is the most important part of the changes? The proposal includes
3 improvements. It might defocus the most variable tuning point.

#1 Execute multiple queries in one transaction.
#2 Run multiple queries in the server with stored procedure.
#3 Return only one value instead of 512.

Anyway, I'm not sure we need to include the query mode into the pgbench's
codes. Instead, how about providing "a sample script" as a separate sql
file? pgbench can execute any script files with -f option.

--
Itagaki Takahiro


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-20 18:45:24
Message-ID: 4DFF9544.3030204@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro wrote:
> Anyway, I'm not sure we need to include the query mode into the pgbench's
> codes. Instead, how about providing "a sample script" as a separate sql
> file? pgbench can execute any script files with -f option.
>

When you execute using "-f", it doesn't correctly detect database
scale. Also, the really valuable thing here is seeing the higher
selects/second number come out in the report. I just realized neither
Jeff nor myself ever included an example of the output in the new mode,
which helps explain some of why the patch is built the way it is:

$ pgbench -c 12 -j 4 -T 30 -P pgbench
plgsql function created.
starting vacuum...end.
transaction type: SELECT only via plpgsql
scaling factor: 100
query mode: simple
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 9342
tps = 311.056293 (including connections establishing)
tps = 311.117886 (excluding connections establishing)
selects per second = 159260.822100 (including connections establishing)
selects per second = 159292.357672 (excluding connections establishing)

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-30 04:13:21
Message-ID: BANLkTimuyDhL+cuvvSf7M7wY1siJjoZqXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
...
>
> Things to fix in the patch before it would be a commit candidate:
>
> -Adjust the loop size/name, per above
> -Reformat some of the longer lines to try and respect the implied right
> margin in the code formatting
> -Don't include the "plgsql function created." line unless in debugging mode.
> -Add the docs.  Focus on how this measures how fast the database can execute
> SELECT statements using server-side code.  An explanation that the
> "transaction" block size is 512 is important to share.  It also needs a
> warning that time based runs ("-T") may have to wait for a block to finish
> and go beyond its normally expected end time.
> -The word "via" in the "transaction type" output description is probably not
> the best choice.  Changing to "SELECT only using PL/pgSQL" would translate
> better, and follow the standard case use for the name of that language.

Hi Greg,

Thanks for the review. I've realized that I will not get a chance to
incorporate these changes during this commitfest due to work and
travel schedules, so I have changed it to "Returned with Feedback" and
will update it for the next commitfest.

One more thought I had, would it make sense to change this from the
creation of a PL/pgSQL permanent function to instead use the recently
added DO anonymous block syntax? I think that would be somewhat
cleaner about leaving cruft behind in the database. But it would
increase the overhead of each outer execution, and would also mean
that it would not be backwards compatible to run against servers
before 9.0

Cheers,

Jeff


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-30 04:35:49
Message-ID: 4E0BFD25.8020207@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/30/2011 12:13 AM, Jeff Janes wrote:
> One more thought I had, would it make sense to change this from the
> creation of a PL/pgSQL permanent function to instead use the recently
> added DO anonymous block syntax? I think that would be somewhat
> cleaner about leaving cruft behind in the database. But it would
> increase the overhead of each outer execution, and would also mean
> that it would not be backwards compatible to run against servers
> before 9.0
>

I think some measurement of the overhead difference would be needed to
know for sure about the first part. I suspect that given the block size
of 512 now being targeted, that would end up not mattering very much.

pgbench's job is to generate a whole database full of cruft, so I can't
say I'd find an argument from either side of that to be very
compelling. I'm not real busy anymore testing performance of PostgreSQL
instances from before 9.0 anymore either, so whether this mode was
compatible with them or not isn't very compelling either. Just a mixed
bag all around in those areas.

I would say take a look at what the performance change looks like, and
see if it turns out to make the patch to pgbench less obtrusive. The
main objection against committing this code I can see is that it will
further complicate pgbench for a purpose not many people care about. So
if the DO version ends up with a smaller diff and less impact on the
codebase, that would likely be a more substantial tie-breaker in its
favor than any of these other arguments.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-07-26 00:12:38
Message-ID: CAMkU=1zFesy+MMNcW=xmjF_iza7ZDWZrY58ANJBM=icnocMWFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 19, 2011 at 9:34 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Jun 20, 2011 at 07:30, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> I applied Jeff's patch but changed this to address concerns about the
>> program getting stuck running for too long in the function:
>>
>> #define plpgsql_loops   512
>
> Is it OK to define the value as constant?

I think so. I think anyone in a position to want to change it would
not be adverse to recompiling.

I consider it analogous to nbranches, ntellers, and naccounts, which
are also constants.

> Also, it executes much more queries if -t option (transactions) specified;
> Of course it runs the specified number of "transactions", but actually
> runs plpgsql_loops times than other modes.

Am I being overly punctilious in maintaining the distinction between a
transaction proper, and a select?

In a similar vane, the reporting where I have both a tps and a select
per second, seems a bit messy, but I wanted to be overly-explicit, at
least until someone recommended a less confusing alternative.

>> I think this is a really nice new workload to demonstrate.  One of the
>> things we tell people is that code works much faster when moved server-side,
>
> What is the most important part of the changes? The proposal includes
> 3 improvements. It might defocus the most variable tuning point.
>
>  #1 Execute multiple queries in one transaction.
>  #2 Run multiple queries in the server with stored procedure.
>  #3 Return only one value instead of 512.

#2 is the most important change. The other changes are just "along
for the ride" as a side effect of #2.

I think #1 issue is probably minor in single-client cases, although it
can avoid major contention in multi client cases (although recent work
by Robert Haas may alleviate much of that).
Since transactions cannot be started and ended inside server-side
code, I am not able to isolate and remove #1 from the rest of my
changes. One can take the other approach, however, by running queries
the normal way except all in one transaction, as a comparison. The
"-1" option of the attached toy patch does that (applies to head,
minor conflict at getopt if applied over the main patch of this
thread). Numbers for various combination in single client
(unfortunately, run on slightly slower CPU than my previous example):

9,164.85 -S
10,144.71 -S -1
13,980.64 -S -M prepared
16,004.97 -S -M prepared -1
39,600.67 -P

I had never even considered #3--it is just an accident of how I wrote
the code. I only returned anything at all because a) in early code I
wanted to see the sum, just as a sanity check that the returned value
seemed reasonable, to indicate it was doing what I thought it was
doing, and b) I was worried some optimizer might decide to avoid
executing the selects altogether, if it detected the results of them
were never used. Should I find a way to return 512 values from a
single function call, either as part of the submitted code, or just as
a side test to show if it makes any difference?

> Anyway, I'm not sure we need to include the query mode into the pgbench's
> codes. Instead, how about providing "a sample script" as a separate sql
> file? pgbench can execute any script files with -f option.

In the absence of -s and presence of -f, :scale gets set to 1, rather
than to "select count(*) from pgbench_branches".

I don't think it is nice to rely on people to correctly specify -s. I
would like to change -f so that in the absence of -s it uses the same
scale as -S, etc., do. But that would probably be too backwards
incompatible to be an acceptable change.

The other thing would be doing initialization, like creating the
function in this case. Perhaps this could be solved by adding a new
line prefix category to the -f language. Now "\" indicates a
metacommand to be done by pgbench itself. Maybe "!" could indicate a
real SQL command, but one that would be submitted only upon reading
the -f file, and not once per execution. This one might be backwards
compatible, as I don't see why anyone would have historical sql files
sitting around that have lines starting with "!".

Cheers,

Jeff

Attachment Content-Type Size
pgbench_transact.patch text/x-patch 2.0 KB

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-07-26 01:35:55
Message-ID: 4E2E19FB.5040600@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/25/2011 08:12 PM, Jeff Janes wrote:
> In the absence of -s and presence of -f, :scale gets set to 1, rather
> than to "select count(*) from pgbench_branches".
>
> I don't think it is nice to rely on people to correctly specify -s. I
> would like to change -f so that in the absence of -s it uses the same
> scale as -S, etc., do. But that would probably be too backwards
> incompatible to be an acceptable change.
>

Auto-detecting scale only works if you have a database populated with
the pgbench tables. You can use "pgbench -f" to run arbitrary bits of
SQL, using pgbench as the driver program for all sorts of benchmarking
tasks against other data sets. For example, at
http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf I
show how to use it for testing how fast INSERT statements of various
sizes can execute.

The very concept of a "scale" may not make sense for other data sets
that pgbench will happily run against when using "-f". The only sort of
heuristic I have considered adding here when running in that mode is:

1) Check if pgbench_branches exists.
2) If so, count the records to derive a scale, as currently done in the
non "-f" cases
3) Should that scale not match the value of "-s", issue a warning.

You have to assume anyone sophisticated enough to be playing with "-f"
may be doing something the program doesn't expect or understand, and let
them do that without trying to "fix" what may be intentional behavior.
But a check for the most common mistake made in this area wouldn't
bother people who aren't using pgbench in its original form at all,
while it would help those new to the program from screwing this up.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-01 00:02:34
Message-ID: CAMkU=1w0K3RNhtPuLF8WQoVi6gxgG6mcnpC=-iVjwKJKyDPysw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I applied Jeff's patch but changed this to address concerns about the
> program getting stuck running for too long in the function:
>
> #define plpgsql_loops   512
>
> This would be better named as "plpgsql_batch_size" or something similar
> instead, the current name suggests it's how many loops to run which is
> confusing.
>
> My main performance concern here was whether this change really matter so
> much once a larger number of clients were involved.  Some of the other
> things you can do to optimize single-client performance aren't as useful
> with lots of them.  Here's how the improvements in this mode worked for me
> on a server with 4 Hyper-Threaded cores (i870); shared_buffers=256MB,
> scale=100:
>
> 1 client:
> -S: 11533
> -S -M prepared: 19498
> -P: 49547
>
> 12 clients, 4 workers:
> -S:  56052
> -S -M prepared: 82043
> -P: 159443
>
> 96 clients, 8 workers:
> -S: 49940
> -S -M prepared: 76099
> -P: 137942
>
> I think this is a really nice new workload to demonstrate.  One of the
> things we tell people is that code works much faster when moved server-side,
> but how much faster isn't always easy to show.  Having this mode available
> lets them see how dramatic that can be quite easily.  I know I'd like to be
> able to run performance tests for clients of new hardware using PostgreSQL
> and tell them something like this:  "With simple clients executing a
> statement at a time, this server reaches 56K SELECTs/section.  But using
> server-side functions to execute them in larger batches it can do 159K".
>
> The value this provides for providing an alternate source for benchmark load
> generation, with a very different profile for how it exercises the server,
> is good too.
>
> Things to fix in the patch before it would be a commit candidate:
>
> -Adjust the loop size/name, per above
> -Reformat some of the longer lines to try and respect the implied right
> margin in the code formatting
> -Don't include the "plgsql function created." line unless in debugging mode.
> -Add the docs.  Focus on how this measures how fast the database can execute
> SELECT statements using server-side code.  An explanation that the
> "transaction" block size is 512 is important to share.  It also needs a
> warning that time based runs ("-T") may have to wait for a block to finish
> and go beyond its normally expected end time.
> -The word "via" in the "transaction type" output description is probably not
> the best choice.  Changing to "SELECT only using PL/pgSQL" would translate
> better, and follow the standard case use for the name of that language.

Sorry it has taken me a year to get back to this patch. I have wanted
to use it, and to ask other people to run it and report their results,
several time recently, so I would like to get it into the core.

I've attached a new patch which addresses several of your concerns,
and adds the documentation. The description is much longer than the
descriptions of other nearby options, which mostly just give a simple
statement of what they do rather than a description of why that is
useful. I don't know if that means I'm starting a good trend, or a
bad one, or I'm just putting the exposition in the wrong place.

In addition to showing the benefits of coding things on the server
side when that is applicable, it also allows hackers to stress parts
of the server code that are not easy to stress otherwise.

Cheers,

Jeff

Attachment Content-Type Size
pgbench_loop_v2.patch application/octet-stream 5.9 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 07:32:38
Message-ID: 4FE17C96.3090501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.06.2012 03:02, Jeff Janes wrote:
> I've attached a new patch which addresses several of your concerns,
> and adds the documentation. The description is much longer than the
> descriptions of other nearby options, which mostly just give a simple
> statement of what they do rather than a description of why that is
> useful. I don't know if that means I'm starting a good trend, or a
> bad one, or I'm just putting the exposition in the wrong place.
>
> In addition to showing the benefits of coding things on the server
> side when that is applicable, it also allows hackers to stress parts
> of the server code that are not easy to stress otherwise.

As you mentioned in your original email over a year ago, most of this
could be done as a custom script. It's nice to have another workload to
test, but then again, there's an infinite number of workloads that might
be interesting.

You can achieve the same with this custom script:

\set loops 512

do $$ DECLARE sum integer default 0; amount integer; account_id
integer; BEGIN FOR i IN 1..:loops LOOP account_id=1 + floor(random() *
:scale); SELECT abalance into strict amount FROM pgbench_accounts
WHERE aid = account_id; sum := sum + amount; END LOOP; END; $$;

It's a bit awkward because it has to be all on one line, and you don't
get the auto-detection of scale. But those are really the only
differences AFAICS.

I think we would be better off improving pgbench to support those things
in custom scripts. It would be nice to be able to write initialization
steps that only run once in the beginning of the test. You could then
put the "SELECT COUNT(*) FROM pgbench_branches" there, to do the scale
auto-detection.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 07:48:22
Message-ID: CA+U5nMKja5HEM+SX6p-pgGgGQq+5Cz7Ds860i1sNyLRmKzYRtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 June 2012 15:32, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 01.06.2012 03:02, Jeff Janes wrote:
>>
>> I've attached a new patch which addresses several of your concerns,
>> and adds the documentation.  The description is much longer than the
>> descriptions of other nearby options, which mostly just give a simple
>> statement of what they do rather than a description of why that is
>> useful.  I don't know if that means I'm starting a good trend, or a
>> bad one, or I'm just putting the exposition in the wrong place.
>>
>> In addition to showing the benefits of coding things on the server
>> side when that is applicable, it also allows hackers to stress parts
>> of the server code that are not easy to stress otherwise.
>
>
> As you mentioned in your original email over a year ago, most of this could
> be done as a custom script. It's nice to have another workload to test, but
> then again, there's an infinite number of workloads that might be
> interesting.
>
> You can achieve the same with this custom script:
>
> \set loops 512
>
> do $$ DECLARE  sum integer default 0; amount integer; account_id integer;
> BEGIN FOR i IN 1..:loops LOOP   account_id=1 + floor(random() * :scale);
> SELECT abalance into strict amount FROM pgbench_accounts WHERE aid =
> account_id;   sum := sum + amount; END LOOP; END; $$;
>
> It's a bit awkward because it has to be all on one line, and you don't get
> the auto-detection of scale. But those are really the only differences
> AFAICS.
>
> I think we would be better off improving pgbench to support those things in
> custom scripts. It would be nice to be able to write initialization steps
> that only run once in the beginning of the test. You could then put the
> "SELECT COUNT(*) FROM pgbench_branches" there, to do the scale
> auto-detection.

I'm sure Jeff submitted this because of the need for a standard test,
rather than the wish to actually modify pgbench itself.

Can I suggest that we include a list of standard scripts with pgbench
for this purpose? These can then be copied alongside the binary when
we do an install.

That way this patch can become a standard script, plus an entry in the
docs to list this.

We could even include scripts for the usual cases, to allow them to be
more easily viewed/copied/modified.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 17:42:32
Message-ID: CA+TgmoYE9Mo9N4_DopDDTP-L2kF+K=5Lxz1KVo_C=Z-0PsAg0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 20, 2012 at 3:48 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I'm sure Jeff submitted this because of the need for a standard test,
> rather than the wish to actually modify pgbench itself.
>
> Can I suggest that we include a list of standard scripts with pgbench
> for this purpose? These can then be copied alongside the binary when
> we do an install.

I was thinking along similar lines myself. At the least, I think we
can't continue to add a short option for every new test type.
Instead, maybe we could have --test-type=WHATEVER, and perhaps that
then reads whatever.sql from some compiled-in directory. That would
allow us to sanely support a moderately large number of tests.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 18:41:49
Message-ID: CAEYLb_XSuKvc3Kg7Q8sOsPEDPJqHsagRRwOCZLjgVwPJb7o5Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 June 2012 18:42, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 20, 2012 at 3:48 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> I'm sure Jeff submitted this because of the need for a standard test,
>> rather than the wish to actually modify pgbench itself.
>>
>> Can I suggest that we include a list of standard scripts with pgbench
>> for this purpose? These can then be copied alongside the binary when
>> we do an install.
>
> I was thinking along similar lines myself.  At the least, I think we
> can't continue to add a short option for every new test type.
> Instead, maybe we could have --test-type=WHATEVER, and perhaps that
> then reads whatever.sql from some compiled-in directory.  That would
> allow us to sanely support a moderately large number of tests.

+1. As long as pgbench is considered to be the standard benchmarking
tool (and I think that it is a general problem that it is), we ought
to make an effort to give people more options.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 18:57:10
Message-ID: 4FE21D06.5060300@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20.06.2012 21:41, Peter Geoghegan wrote:
> On 20 June 2012 18:42, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jun 20, 2012 at 3:48 AM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>>> I'm sure Jeff submitted this because of the need for a standard test,
>>> rather than the wish to actually modify pgbench itself.
>>>
>>> Can I suggest that we include a list of standard scripts with pgbench
>>> for this purpose? These can then be copied alongside the binary when
>>> we do an install.
>>
>> I was thinking along similar lines myself. At the least, I think we
>> can't continue to add a short option for every new test type.
>> Instead, maybe we could have --test-type=WHATEVER, and perhaps that
>> then reads whatever.sql from some compiled-in directory. That would
>> allow us to sanely support a moderately large number of tests.

We could call the --test-type option -f, and the "compiled-in directory"
could be the current directory ;-).

> +1. As long as pgbench is considered to be the standard benchmarking
> tool (and I think that it is a general problem that it is), we ought
> to make an effort to give people more options.

Yeah, this sounds like a good approach. A library of standard workload
scripts seems very useful. I've been using custom scripts to benchmark
WAL insertion scalability lately, that also seems like a kind of a thing
to put in such a library. I don't know if we should ship the library of
scripts in contrib, or just put them up on a web site, but something
like that...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 19:16:35
Message-ID: CA+U5nM+pGuimVWzewaz_zs3SzDzYi2dxq0KoX7J88SF-zQU_bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 June 2012 02:57, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 20.06.2012 21:41, Peter Geoghegan wrote:
>>
>> On 20 June 2012 18:42, Robert Haas<robertmhaas(at)gmail(dot)com>  wrote:
>>>
>>> On Wed, Jun 20, 2012 at 3:48 AM, Simon Riggs<simon(at)2ndquadrant(dot)com>
>>>  wrote:
>>>>
>>>> I'm sure Jeff submitted this because of the need for a standard test,
>>>> rather than the wish to actually modify pgbench itself.
>>>>
>>>> Can I suggest that we include a list of standard scripts with pgbench
>>>> for this purpose? These can then be copied alongside the binary when
>>>> we do an install.
>>>
>>>
>>> I was thinking along similar lines myself.  At the least, I think we
>>> can't continue to add a short option for every new test type.
>>> Instead, maybe we could have --test-type=WHATEVER, and perhaps that
>>> then reads whatever.sql from some compiled-in directory.  That would
>>> allow us to sanely support a moderately large number of tests.
>
>
> We could call the --test-type option -f, and the "compiled-in directory"
> could be the current directory ;-).

;-)

Yeh, -f is good

We should read $PATH, but if not found, it should search the share dir
for a script of that name.

>> +1. As long as pgbench is considered to be the standard benchmarking
>> tool (and I think that it is a general problem that it is), we ought
>> to make an effort to give people more options.
>
>
> Yeah, this sounds like a good approach. A library of standard workload
> scripts seems very useful. I've been using custom scripts to benchmark WAL
> insertion scalability lately, that also seems like a kind of a thing to put
> in such a library. I don't know if we should ship the library of scripts in
> contrib, or just put them up on a web site, but something like that...

Mix of both, I guess. We just add in to contrib any new scripts that
prove useful

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 19:22:11
Message-ID: CAEYLb_WJ5eo0yg_fV6jRQ_GSgQ3bPhd-XZ34Y7NwNCrzLW1eJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 June 2012 19:57, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Yeah, this sounds like a good approach. A library of standard workload
> scripts seems very useful. I've been using custom scripts to benchmark WAL
> insertion scalability lately, that also seems like a kind of a thing to put
> in such a library. I don't know if we should ship the library of scripts in
> contrib, or just put them up on a web site, but something like that...

The situation would be made a lot better if we could just find a way
to generalise pgbench a little bit more. I'm thinking about a facility
for specifying new tables in scripts, with a moderate degree of
flexibility as to their definition, data, and the distribution of that
data.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 19:54:06
Message-ID: CA+TgmoZa5C4rZC=4sCrU4SC15vg-kts_LhiP1YR9WVvLgt1d2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 20, 2012 at 2:57 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 20.06.2012 21:41, Peter Geoghegan wrote:
>> On 20 June 2012 18:42, Robert Haas<robertmhaas(at)gmail(dot)com>  wrote:
>>> On Wed, Jun 20, 2012 at 3:48 AM, Simon Riggs<simon(at)2ndquadrant(dot)com>
>>>  wrote:
>>>> I'm sure Jeff submitted this because of the need for a standard test,
>>>> rather than the wish to actually modify pgbench itself.
>>>>
>>>> Can I suggest that we include a list of standard scripts with pgbench
>>>> for this purpose? These can then be copied alongside the binary when
>>>> we do an install.
>>>
>>> I was thinking along similar lines myself.  At the least, I think we
>>> can't continue to add a short option for every new test type.
>>> Instead, maybe we could have --test-type=WHATEVER, and perhaps that
>>> then reads whatever.sql from some compiled-in directory.  That would
>>> allow us to sanely support a moderately large number of tests.
>
> We could call the --test-type option -f, and the "compiled-in directory"
> could be the current directory ;-).

Well, that sounds a lot like "let's reject the patch". Which would be
OK with me, I guess, but if the goal is to make it easy for all
developers to run that particular test, I'm not sure that's getting us
there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-06-20 20:08:54
Message-ID: 4FE22DD6.5020904@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/20/2012 03:22 PM, Peter Geoghegan wrote:
> The situation would be made a lot better if we could just find a way
> to generalise pgbench a little bit more. I'm thinking about a facility
> for specifying new tables in scripts, with a moderate degree of
> flexibility as to their definition, data, and the distribution of that
> data.

That's probably excess scope creep for this CF though. And converting
this new script to use a longer name like --test would be a useful
forward step for such a larger plan anyway.

I think there's some useful value to breaking out the existing test
scripts into files too, rather than all of them being embedded in the
source code. While there's not any pressure to reclaim existing
switches like -S and -N, having those just be shortcuts for a
"--test=pgbench-select-only.sql" form would make sure that external
script facility worked right. I think there's some value to making it
easier for people to copy the built-in scripts and hack on them too.

The main sticky point I see here that really needs to be improved now,
to turn any of the built-in ones to external scripts, is flagging how to
handle the database scale. Right now the built-in scripts set :scale
based on counting pgbench_branches, while external ones run with "-f" do
not. What things like pgbench-tools end up doing is manually doing that
count and then passing it in with "-s <scale>". That's annoying, and if
this is getting refactored I'd like to improve that too. Getting that
logic cleanly while being backward compatible is a bit complicated; I'm
going to think about that for a bit.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-07-01 21:03:34
Message-ID: CAMkU=1zHL1djt2-KFyRAYKjWw1izg6OD-nt=BDP2h_3wUmQ=Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 20, 2012 at 12:32 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 01.06.2012 03:02, Jeff Janes wrote:
>>
>> I've attached a new patch which addresses several of your concerns,
>> and adds the documentation.  The description is much longer than the
>> descriptions of other nearby options, which mostly just give a simple
>> statement of what they do rather than a description of why that is
>> useful.  I don't know if that means I'm starting a good trend, or a
>> bad one, or I'm just putting the exposition in the wrong place.
>>
>> In addition to showing the benefits of coding things on the server
>> side when that is applicable, it also allows hackers to stress parts
>> of the server code that are not easy to stress otherwise.
>
>
> As you mentioned in your original email over a year ago, most of this could
> be done as a custom script. It's nice to have another workload to test, but
> then again, there's an infinite number of workloads that might be
> interesting.

I would argue that my specific proposed transaction does occupy a
somewhat privileged place, as it uses the default table structure, and
it fits on a natural progression along the default, -N, -S continuum.
In each step one (or more) source of bottleneck is removed, to allow
different ones to bubble up to the top to be inspected and addressed.

I've written dozens of custom benchmark scripts, and so far this is
the only one I've thought was generally useful enough to think that it
belongs inside the core.

Do I need to make a better argument for why this particular
transaction is as generally useful as I think it is; or is the
objection to the entire idea that any new transactions should be added
at all?

> You can achieve the same with this custom script:
>
> \set loops 512
>
> do $$ DECLARE  sum integer default 0; amount integer; account_id integer;
> BEGIN FOR i IN 1..:loops LOOP   account_id=1 + floor(random() * :scale);
> SELECT abalance into strict amount FROM pgbench_accounts WHERE aid =
> account_id;   sum := sum + amount; END LOOP; END; $$;
>
> It's a bit awkward because it has to be all on one line, and you don't get
> the auto-detection of scale. But those are really the only differences
> AFAICS.

True. And we could fix the one-line thing by allowing lines ending in
"\" to be continued. And this might be mostly backwards compatible,
as I don't think there is much of a reason to end a statement with a
literal "\".

We could make the scale be auto-detected, but that would be more
dangerous as a backward compatibility thing, as people with existing
custom scripts might rely :scale being set to 1.

But it would defeat my primary purpose of making it easy for us to ask
some else posting on this list or on "performance" to run this. "Copy
this file someplace, then make sure you use the correct scale, and
remember where you put it, etc." is a much higher overhead.

Also, my patch changes the output formatting in a way that couldn't be
done with a custom script, and might be very confusing if it were not
changed. I don't how good of an argument this is. "Remember to
multiply the reported TPS by 512" is another barrier to easy use.

>
> I think we would be better off improving pgbench to support those things in
> custom scripts. It would be nice to be able to write initialization steps
> that only run once in the beginning of the test.

I can think of 3 possibly desirable behaviors. "Run once at -i time",
"Run once per pgbench run", and "run once per connection". Should all
of those be implemented?

> You could then put the
> "SELECT COUNT(*) FROM pgbench_branches" there, to do the scale
> auto-detection.

So, we should add a way to do "\set variable_name <SQL QUERY>"?

Would we use a command other than \set to do that, or look at the
thing after the variable to decide if it is a query rather than a
simple expression?

Cheers,

Jeff


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-07-16 01:22:36
Message-ID: CAEYLb_Xx+EEH1-htr8cWdAwYJKAttFTEczk1-wMAUJq3hHszkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 June 2012 01:02, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Sorry it has taken me a year to get back to this patch. I have wanted
> to use it, and to ask other people to run it and report their results,
> several time recently, so I would like to get it into the core.

Who marked this patch as rejected in the commitfest app? Why?

It would be nice if this information was automatically posted as a
"comment". Surely if a patch is rejected, there should be at least a
one-line explanation.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2012-07-16 01:57:05
Message-ID: 27590.1342403825@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <peter(at)2ndquadrant(dot)com> writes:
> On 1 June 2012 01:02, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> Sorry it has taken me a year to get back to this patch. I have wanted
>> to use it, and to ask other people to run it and report their results,
>> several time recently, so I would like to get it into the core.

> Who marked this patch as rejected in the commitfest app? Why?

You don't think this bunch of database weenies would use an app without
a transaction log, I hope ;-). If you go to the "activity log" (link
at the upper right) it shows that Heikki did the deed, at
2012-06-20 07:34:08. The decision seems to have been taken in the
thread ending here:
http://archives.postgresql.org/pgsql-hackers/2012-06/msg01229.php
which I observe you participated in ...

> It would be nice if this information was automatically posted as a
> "comment". Surely if a patch is rejected, there should be at least a
> one-line explanation.

I suppose Heikki should have added a comment pointing to one or another
of those messages.

regards, tom lane