pgbench --tuple-size option

Lists: pgsql-hackers
From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: pgbench --tuple-size option
Date: 2014-08-15 09:46:52
Message-ID: alpine.DEB.2.10.1408151141570.29316@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


After publishing some test results with pgbench on SSD with varying page
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples,
and that results may be different with other tuple sizes.

This patch adds an option to change the default tuple size, so that this
can be tested easily.

--
Fabien.

Attachment Content-Type Size
pgbench-tupsize-1.patch text/x-diff 4.7 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 09:51:06
Message-ID: 20140815095106.GG28805@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote:
>
> After publishing some test results with pgbench on SSD with varying page
> size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and
> that results may be different with other tuple sizes.
>
> This patch adds an option to change the default tuple size, so that this can
> be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 09:58:41
Message-ID: alpine.DEB.2.02.1408151152460.14344@andorre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Andres,

>> This patch adds an option to change the default tuple size, so that this can
>> be tested easily.
>
> I don't think it's beneficial to put this into pgbench. There really
> isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument
does not apply here.

--
Fabien.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 10:01:30
Message-ID: 20140815100130.GH28805@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote:
>
> Hello Andres,
>
> >>This patch adds an option to change the default tuple size, so that this can
> >>be tested easily.
> >
> >I don't think it's beneficial to put this into pgbench. There really
> >isn't a relevant benefit over using a custom script here.
>
> The scripts to run are the standard ones. The difference is in the
> *initialization* phase (-i), namely the filler attribute size. There is no
> custom script for initialization in pgbench, so ISTM that this argument does
> not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 10:17:31
Message-ID: alpine.DEB.2.02.1408151206180.14344@andorre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>>> I don't think it's beneficial to put this into pgbench. There really
>>> isn't a relevant benefit over using a custom script here.
>>
>> The scripts to run are the standard ones. The difference is in the
>> *initialization* phase (-i), namely the filler attribute size. There is no
>> custom script for initialization in pgbench, so ISTM that this argument does
>> not apply here.
>
> The custom initialization is to run a manual ALTER after the
> initialization.

Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage, thus I
prefer all benchmarks to run exactly the same straightforward way in all
cases so as to avoid unwanted effects on what I'm trying to measure, which
is already noisy and unstable enough.

--
Fabien.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 10:24:20
Message-ID: 20140815102419.GI28805@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote:
>
> >>>I don't think it's beneficial to put this into pgbench. There really
> >>>isn't a relevant benefit over using a custom script here.
> >>
> >>The scripts to run are the standard ones. The difference is in the
> >>*initialization* phase (-i), namely the filler attribute size. There is no
> >>custom script for initialization in pgbench, so ISTM that this argument does
> >>not apply here.
> >
> >The custom initialization is to run a manual ALTER after the
> >initialization.
>
> Sure, it can be done this way.
>
> I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case. But if that's what you're concerned about -
understandably - it seems to make more sense to split -i into two. One
to create the tables, and another to fill them. That'd allow to do
manual stuff inbetween.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 11:33:20
Message-ID: alpine.DEB.2.02.1408151317160.14344@andorre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> I'm not sure about the implication of ALTER on the table storage,
>
> Should be fine in this case. But if that's what you're concerned about -
> understandably -

Indeed, my (long) experience with benchmarks is that it is a much more
complicated that it looks if you want to really understand what you are
getting, and to get anything meaningful.

> it seems to make more sense to split -i into two. One to create the
> tables, and another to fill them. That'd allow to do manual stuff
> inbetween.

Hmmm. This would mean much more changes than the pretty trivial patch I
submitted: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply
to first part while other apply to second part, which would lead in any
case to a signicantly more complicated documentation... a lot of trouble
for my use case to answer Josh pertinent comments, and to be able to test
the "tuple size" factor easily. Moreover, I would reject it myself as too
much trouble for a small benefit.

Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).

--
Fabien.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 11:36:39
Message-ID: 20140815113639.GJ28805@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
> >it seems to make more sense to split -i into two. One to create the
> >tables, and another to fill them. That'd allow to do manual stuff
> >inbetween.
>
> Hmmm. This would mean much more changes than the pretty trivial patch I
> submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

>: more options (2 parts init + compatibility with the previous
> case), splitting the "init" function, having a dependency and new error
> cases to check (you must have the table to fill them), some options apply to
> first part while other apply to second part, which would lead in any case to
> a signicantly more complicated documentation... a lot of trouble for my use
> case to answer Josh pertinent comments, and to be able to test the "tuple
> size" factor easily. Moreover, I would reject it myself as too much trouble
> for a small benefit.

Well, it's something more generic, because it allows you do do more...

> Feel free to reject the patch if you do not want it. I think that its
> cost/benefit is reasonable (one small option, small code changes, some
> benefit for people who want to measure performance in various cases).

I personally think this isn't worth the price. But I'm just one guy.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 12:42:54
Message-ID: CAHGQGwGMdoUFzwyHrjCyPCpE6xE9dccokwBjNcrk1wxZcF+SWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
>> >it seems to make more sense to split -i into two. One to create the
>> >tables, and another to fill them. That'd allow to do manual stuff
>> >inbetween.
>>
>> Hmmm. This would mean much more changes than the pretty trivial patch I
>> submitted
>
> FWIW, I find that patch really ugly. Adding the filler's with in a
> printf, after the actual DDL declaration. Without so much as a
> comment. Brr.
>
>>: more options (2 parts init + compatibility with the previous
>> case), splitting the "init" function, having a dependency and new error
>> cases to check (you must have the table to fill them), some options apply to
>> first part while other apply to second part, which would lead in any case to
>> a signicantly more complicated documentation... a lot of trouble for my use
>> case to answer Josh pertinent comments, and to be able to test the "tuple
>> size" factor easily. Moreover, I would reject it myself as too much trouble
>> for a small benefit.
>
> Well, it's something more generic, because it allows you do do more...
>
>> Feel free to reject the patch if you do not want it. I think that its
>> cost/benefit is reasonable (one small option, small code changes, some
>> benefit for people who want to measure performance in various cases).
>
> I personally think this isn't worth the price. But I'm just one guy.

I also don't like this feature. The benefit of this option seems too small.
If we apply this, we might want to support other options, for example,
option to change the data type of each column, option to create new
index using "minmax", option to change the fillfactor of each table, ...etc.
There are countless such options, but I'm afraid that it's really hard to
support so many options.

Regards,

--
Fujii Masao


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-15 13:28:20
Message-ID: alpine.DEB.2.02.1408151521000.14344@andorre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Hmmm. This would mean much more changes than the pretty trivial patch I
>> submitted
>
> FWIW, I find that patch really ugly. Adding the filler's with in a
> printf, after the actual DDL declaration. Without so much as a
> comment. Brr.

Indeed. I'm not too proud of that very point either:-) You are right that
it deserves at the minimum a clear comment. To put the varying size in the
DDL string means vsprintf and splitting the query building some more,
which I do not find desirable.

> [...]
> Well, it's something more generic, because it allows you do do more...

Apart from I do not need it (at least right now), and that it is more
work, my opinion is that it would be rejected. Not a strong insentive to
spend time in that direction.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench --tuple-size option
Date: 2014-08-16 12:23:37
Message-ID: alpine.DEB.2.10.1408152255110.29316@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>>> The custom initialization is to run a manual ALTER after the
>>> initialization.
>>
>> Sure, it can be done this way.
>>
>> I'm not sure about the implication of ALTER on the table storage,
>
> Should be fine in this case.

After some testing and laughing, my conclusion is "not fine at all". The
"filler" attributes in "pgbench" are by default "EXTENDED", which mean
possibly compressed... As the the default value is '', the compression,
when tried for large sizes, performs very well, and the performance is the
same as with a (declared) smaller tuple:-) Probably not the intention of
the benchmark designer. Conclusion: I need an ALTER TABLE anyway to change
the STORAGE. Or maybe pgbench should always do it anyway...

Conclusion 2: I've noted the submission as "rejected" as both you and
Fujii don't like it, and although I found it useful, but I can do without
it quite easily.

--
Fabien.