Re: pgbench throttling latency limit

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jan Wieck <jan(at)wi3ck(dot)info>, Rukh Meski <rukh(dot)meski(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench throttling latency limit
Date: 2014-09-12 18:27:33
Message-ID: 54133B15.8060800@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/10/14, 10:57 AM, Fabien COELHO wrote:
> Indeed. I think that people do not like it to change. I remember that
> I suggested to change timestamps to "xxxx.yyyyyy" instead of the
> unreadable "xxxx yyy", and be told not to, because some people have
> tool which process the output so the format MUST NOT CHANGE. So my
> behavior is not to avoid touching anything in this area.

That somewhat hysterical version of events isn't what I said. Heikki has
the right idea for backpatching, so let me expand on that rationale,
with an eye toward whether 9.5 is the right time to deal with this.

Not all software out there will process epoch timestamps with
milliseconds added as a fraction at the end. Being able to read an
epoch time in seconds as an integer is a well defined standard; the
fraction part is not.

Here's an example of the problem, from a Mac OS X system:

$ date -j -f "%a %b %d %T %Z %Y" "`date`" "+%s"
1410544903
$ date -r 1410544903
Fri Sep 12 14:01:43 EDT 2014
$ date -r 1410544903.532
usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]

The current file format allows any random shell script to use a tool
like cut to pull out the second resolution timestamp column as an epoch
integer field, then pass it through even a utility as simple as date to
reformat that. And for a lot of people, second resolution is perfectly
fine anyway.

The change you propose will make that job harder for some people, in
order to make the job you're interested in easier. I picked the
simplest possible example, but there are more. Whether epoch timestamps
can have millisecond parts depends on your time library in Java, in
Python some behavior depends on whether you have 2.6 or earlier, I don't
think gnuplot handles milllisecond ones at all yet; the list goes on and
on. Some people will just have to apply a second split for timestamp
string pgbench outputs, at the period and use the left side, where right
now they can just split the whole thing on a space.

What you want to do is actually fine with me--and as far as I know, I'm
the producer of the most popular pgbench latency parsing script
around--but it will be a new sort of headache. I just wanted the
benefit to outweigh that. Breaking the existing scripts and burning
compatibility with simple utilities like date was not worth the tiny
improvement you wanted in your personal workflow. That's just not how
we do things in PostgreSQL.

If there's a good case that the whole format needs to be changed anyway,
like adding a new field, then we might as well switch to fractional
epoch timestamps too now though. When I added timestamps to the latency
log in 8.3, parsers that handled milliseconds were even more rare.
Today it's still inconsistent, but the workarounds are good enough to me
now. There's a lot more people using things like Python instead of bash
pipelines here in 2014 too.

--
Greg Smith greg(dot)smith(at)crunchydatasolutions(dot)com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-12 18:28:40 Re: Support for N synchronous standby servers
Previous Message Abhijit Menon-Sen 2014-09-12 18:22:24 Re: pgcrypto: PGP signatures