Re: review: pgbench - aggregation of info written into log

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: review: pgbench - aggregation of info written into log
Date: 2012-11-11 23:34:40
Message-ID: 50A03610.8050801@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

attached is a v4 of the patch. There are not many changes, mostly some
simple tidying up, except for handling the Windows.

After a bit more investigation, it seems to me that we can't really get
the same behavior as in other systems - basically the timestamp is
unavailable so we can't log the interval start. So it seems to me the
best we can do is to disable this option on Windows (and this is done in
the patch). So when you try to use "--aggregate-interval" on Win it will
complain it's an unknown option.

Now that I think of it, there might be a better solution that would not
mimic the Linux/Unix behaviour exactly - we do have elapsed time since
the start of the benchmark, so maybe we should use this instead of the
timestamp? I mean on systems with reasonable gettimeofday we'd get

1345828501 5601 1542744 483552416 61 2573
1345828503 7884 1979812 565806736 60 1479
1345828505 7208 1979422 567277552 59 1391
1345828507 7685 1980268 569784714 60 1398
1345828509 7073 1979779 573489941 236 1411

but on Windows we'd get

0 5601 1542744 483552416 61 2573
2 7884 1979812 565806736 60 1479
4 7208 1979422 567277552 59 1391
6 7685 1980268 569784714 60 1398
8 7073 1979779 573489941 236 1411

i.e. the first column is "seconds since start of the test". Hmmmm, and
maybe we could call 'gettimeofday' at the beginning, to get the
timestamp of the test start (AFAIK the issue on Windows is the
resolution, but it should be enough). Then we could add it up with the
elapsed time and we'd get the same output as on Linux.

And maybe this could be done in regular pgbench execution too? But I
really need help from someone who knows Windows and can test this.

Comments regarding Pavel's reviews are below:

On 2.10.2012 20:08, Pavel Stehule wrote:
> Hello
>
> I did review of pgbench patch
> http://archives.postgresql.org/pgsql-hackers/2012-09/msg00737.php
>
> * this patch is cleanly patched
>
> * I had a problem with doc
>
> make[1]: Entering directory `/home/pavel/src/postgresql/doc/src/sgml'
> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
> -c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl
> -t sgml -i output-html -V html-index postgres.sgml
> openjade:pgbench.sgml:767:8:E: document type does not allow element
> "TITLE" here; missing one of "ABSTRACT", "AUTHORBLURB", "MSGSET",
> "CALLOUTLIST", "ITEMIZEDLIST", "ORDEREDLIST", "SEGMENTEDLIST",
> "VARIABLELIST", "CAUTION", "IMPORTANT", "NOTE", "TIP", "WARNING",
> "FORMALPARA", "BLOCKQUOTE", "EQUATION", "EXAMPLE", "FIGURE", "TABLE",
> "PROCEDURE", "SIDEBAR", "QANDASET", "REFSECT3" start-tag
> make[1]: *** [HTML.index] Error 1
> make[1]: *** Deleting file `HTML.index'
> make[1]: Leaving directory `/home/pavel/src/postgresql/doc/src/sgml

Hmmm, I've never got the docs to build at all, all I do get is a heap of
some SGML/DocBook related issues. Can you investigate a bit more where's
the issue? I don't remember messing with the docs in a way that might
cause this ... mostly copy'n'paste edits.

> * pgbench is compiled without warnings
>
> * there is a few issues in source code
>
> +
> + /* should we aggregate the results or not? */
> + if (use_log_agg)
> + {
> +
> + /* are we still in the same interval? if yes, accumulate the
> + * values (print them otherwise) */
> + if (agg->start_time + use_log_agg >= INSTR_TIME_GET_DOUBLE(now))
> + {
> +

Errr, so what are the issues here?

>
> * a real time range for aggregation is dynamic (pgbench is not
> realtime application), so I am not sure if following constraint has
> sense
>
> + if ((duration > 0) && (use_log_agg > 0) && (duration % use_log_agg != 0)) {
> + fprintf(stderr, "duration (%d) must be a multiple of aggregation
> interval (%d)\n", duration, use_log_agg);
> + exit(1);
> + }

I'm not sure what might be the issue here either. If the test duration
is set (using "-T" option), then this kicks in and says that the
duration should be a multiple of aggregation interval. Seems like a
sensible assumption to me. Or do you think this is check should be removed?

> * it doesn't flush last aggregated data (it is mentioned by Tomas:
> "Also, I've realized the last interval may not be logged at all - I'll
> take a look into this in the next version of the patch."). And it can
> be significant for higher values of "use_log_agg"

Yes, and I'm still not sure how to fix this in practice. But I do have
this on my TODO.

>
> * a name of variable "use_log_agg" is not good - maybe "log_agg_interval" ?

I've changed it to agg_interval.

regards
Tomas

Attachment Content-Type Size
pgbench-aggregated-v4.patch text/plain 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-11-11 23:59:29 Re: Enabling Checksums
Previous Message Tom Lane 2012-11-11 23:25:57 Re: Inadequate thought about buffer locking during hot standby replay