Re: gettimeofday is at the end of its usefulness?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: gettimeofday is at the end of its usefulness?
Date: 2014-05-13 22:58:11
Message-ID: 31856.1400021891@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A recent question from Tim Kane prompted me to measure the overhead
costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
are far worse than I thought. On my current server (by no means
lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
at something like 110 nsec per row:

regression=# create table foo as select x as f1 from generate_series(1,1000000) x;
SELECT 1000000
regression=# vacuum foo;
VACUUM
regression=# explain analyze select * from foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.053..111.720 rows=1000000 loops=1)
Planning time: 0.222 ms
Execution time: 166.682 ms
(3 rows)

(and, btw, this is a debug build --- without assert and memory
context checks it'd be faster.)

The problem with this number is that a simple test program shows that
gettimeofday() requires about 40 nsec on this hardware. That means
that two-thirds of the above timing measurement is overhead.

To add insult to injury, gettimeofday's output cannot be more precise than
1 microsecond, making its use for measuring sub-microsecond intervals at
best stochastic.

I looked around a bit and found that recent versions of POSIX have a
function clock_gettime() that's a bit more modern than gettimeofday():
at least, the output struct provides nsec rather than usec available
precision. I benchmarked this, with the CLOCK_REALTIME selector, and
found that it also requires about 40nsec, while the output is actually
good to perhaps 10nsec precision. (I base this on seeing no duplicate
readings in a tight loop, so that the value is certainly getting advanced
more often than once every 40 nsec.)

There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
--- about 10nsec for me --- but the output appears to only advance once
every millisecond, so it's probably useless for our purposes. The other
selectors mentioned in the Linux man page are considerably slower than
CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

I also tried a loop around a bare "rdtsc" assembly instruction, finding
that that instruction takes about 10nsec. That would be a nice
improvement over gettimeofday, except that using that directly would
involve dealing with cross-CPU skew, which seems like no fun at all.
And I don't really want to get into finding equivalents for non-Intel
architectures, either.

Anyway it looks like clock_gettime() might be worth using on Linux
just for the more precise output. It doesn't seem to exist on OS X
though, and I have no idea about elsewhere.

I'm curious if anybody has ideas about other things we might do for
portable high-precision timing.

regards, tom lane

Attachment Content-Type Size
testclock.c text/x-c 657 bytes

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-13 23:33:32
Message-ID: CAM3SWZSyLqKgr-TCga=3kdhYhdW5FnMAcbVLVAsgYyrDwDOC0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 13, 2014 at 3:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
> --- about 10nsec for me --- but the output appears to only advance once
> every millisecond, so it's probably useless for our purposes. The other
> selectors mentioned in the Linux man page are considerably slower than
> CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

What Linux kernel version is in use here? Apparently, as I think
you've stated another way, more recent versions have VDSO for this,
which can make a big difference. This article seems like a sensible
guide to all of this:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html

CLOCK_REALTIME_COARSE seemingly influences precision in a way that
allows user space applications to decide on their precision/cost
trade-off, rather than being forced to use the system default (that
procfs surfaces) through gettimeofday():
http://lwn.net/Articles/342018/

I can see a benefit in exposing this trade-off to Postgres code
directly. I still think that a correlated reference period will prove
useful, and while there are a number of ways to amortize the cost of
repeatedly (coarsely) getting the wall time in the ordinary course of
choosing victim buffers, it would be nice to do this too.
--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 00:16:29
Message-ID: 836.1400026589@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, May 13, 2014 at 3:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
>> --- about 10nsec for me --- but the output appears to only advance once
>> every millisecond, so it's probably useless for our purposes. The other
>> selectors mentioned in the Linux man page are considerably slower than
>> CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

> What Linux kernel version is in use here?

Ah, sorry, I should have specified. This is RHEL6.5, current kernel
version 2.6.32-431.17.1.el6.x86_64.

> Apparently, as I think
> you've stated another way, more recent versions have VDSO for this,
> which can make a big difference. This article seems like a sensible
> guide to all of this:
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html

This appears to be talking about RHEL5, which is quite a bit older
(and, I'd guess, trailing edge for anybody who might deploy PG 9.5).
I did confirm that /proc/sys/kernel/vsyscall64 exists and has a default
setting of 1 on RHEL6. Setting it to 0 causes gettimeofday to take
150ns, which probably represents the time for a trivial kernel call.
The MRG extension described on the linked page doesn't seem to be
implemented in stock RHEL6 (setting vsyscall64 to 2 is allowed but
doesn't change behavior compared to 1). However, if I'm reading it
right, all that does is make gettimeofday behave like
clock_gettime(CLOCK_REALTIME_COARSE).

> CLOCK_REALTIME_COARSE seemingly influences precision in a way that
> allows user space applications to decide on their precision/cost
> trade-off, rather than being forced to use the system default (that
> procfs surfaces) through gettimeofday():
> http://lwn.net/Articles/342018/

Yeah, I think these are the same implementations exposed to apps in two
different ways, one being a system-wide switch affecting gettimeofday()
while the other allows the app source code to say which one it wants.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 03:34:10
Message-ID: CAM-w4HOeJ4iPxwMRe_vkMEEZtMJrw9rDOLggTiQJW65jj2ZfFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 13, 2014 at 11:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I also tried a loop around a bare "rdtsc" assembly instruction, finding
> that that instruction takes about 10nsec. That would be a nice
> improvement over gettimeofday, except that using that directly would
> involve dealing with cross-CPU skew, which seems like no fun at all.
> And I don't really want to get into finding equivalents for non-Intel
> architectures, either.

I always assumed the kernel used rdtsc to implement some of the high
performance timers. It can save the current time in a mapped page when
it schedules a process and then in the vdso syscall (ie in user-space)
it can use rdtsc to calculate the offset needed to adjust that
timestamp to the current time. This seems consistent with your
calculations that showed the 40ns overhead with +/- 10ns precision.

I actually think it would be more interesting if we could measure the
overhead and adjust for it. I don't think people are really concerned
with how long EXPLAIN ANALYZE takes to run if they could get accurate
numbers out of it.

Other profiling tools I poked at in the past ran a tight loop around
the profiling code to estimate the time it actually took and then
subtracted that from all the measurements. I think that might work for
the actual clock_gettime overhead. If we did that then we could call
it twice and measure the time spent in the rest of the EXPLAIN ANALYZE
code and subtract that plus the time for the two clock_gettimes from
the run-time...

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 03:42:51
Message-ID: 4920.1400038971@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> I actually think it would be more interesting if we could measure the
> overhead and adjust for it.

Actually, that's quite a good thought. The overhead should be a pretty
stable number on any given machine, so in theory we could do this to
high precision. And the numbers I just showed say that on current
x86_64 platforms, the *best we could possibly hope for* in terms of
direct overhead reduction is about 4x. Which is something, but it
hardly makes the problem vanish.

I have a vague feeling that we discussed subtract-the-overhead once before
and thought it wasn't necessary yet. Maybe it's time.

But we also need to be using something that gives better than 1usec
resolution. So I'm still thinking we should use clock_gettime() where
available, and look for alternative APIs where not.

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 04:58:07
Message-ID: CAMkU=1xbj6qKd7dmZr6SCvq=uWhsT7cX6HLKG-10FShW0p4zkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, May 13, 2014, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> A recent question from Tim Kane prompted me to measure the overhead
> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
> are far worse than I thought. On my current server (by no means
> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> at something like 110 nsec per row:
>
> regression=# create table foo as select x as f1 from
> generate_series(1,1000000) x;
> SELECT 1000000
> regression=# vacuum foo;
> VACUUM
> regression=# explain analyze select * from foo;
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------
> Seq Scan on foo (cost=0.00..14425.00 rows=1000000 width=4) (actual
> time=0.053..111.720 rows=1000000 loops=1)
> Planning time: 0.222 ms
> Execution time: 166.682 ms
> (3 rows)
>
> (and, btw, this is a debug build --- without assert and memory
> context checks it'd be faster.)
>
> The problem with this number is that a simple test program shows that
> gettimeofday() requires about 40 nsec on this hardware. That means
> that two-thirds of the above timing measurement is overhead.
>

I'm all for finding something better if we can, but in the mean time this
is certainly not unexpected, and isn't it exactly what "explain
(analyze,timing off)" was invented for?

Cheers,

Jeff


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 09:48:37
Message-ID: CA+CSw_s_iTH_nw5XANEQARYuC3MFTqJ9eQZkux_2AocsM6xNLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 14, 2014 at 6:34 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Both gettimeofday and clock_gettime do exactly that. [1]
clock_gettime(CLOCK_MONOTONIC) is the mode of operation we would want
to use here.

> I actually think it would be more interesting if we could measure the
> overhead and adjust for it. I don't think people are really concerned
> with how long EXPLAIN ANALYZE takes to run if they could get accurate
> numbers out of it.

Measuring would also be a good idea so we can automatically turn on
performance counters like IO timing when we know it's not obscenely
expensive.

However, subtracting the overhead will still skew the numbers somewhat
by giving more breathing time for memory and IO prefetching
mechanisms. Another option to consider would be to add a sampling
based mechanism for low overhead time attribution. It would be even
better if we could distinguish between time spent waiting on locks vs.
waiting on IO vs. waiting to be scheduled vs. actually executing.

[1] https://github.com/torvalds/linux/blob/master/arch/x86/vdso/vclock_gettime.c#L223

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-14 11:41:26
Message-ID: CA+TgmoZyAACNS2SD48804BQeTz5DaS2WNLZx-rmFXCLYBJm=Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Crazy idea: Instead of trying to time precisely the amount of time we
spend in each node, configure a very-high frequency timer interrupt
(or background thread?) that does:

SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 07:19:16
Message-ID: CADbMkNPEgbP_GThAvUdFu048zUpVGBHOeODUfGPKiEM-a4GXMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I posted this on this mailing list before at Jane Street we have developed
very fast code to get timing information based on TSC if available. It's
all ocaml but well documented and mostly just calls to c functions so
should be easy to port to C and we release it under a very liberal license
so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene

On Wed, May 14, 2014 at 12:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> > I always assumed the kernel used rdtsc to implement some of the high
> > performance timers. It can save the current time in a mapped page when
> > it schedules a process and then in the vdso syscall (ie in user-space)
> > it can use rdtsc to calculate the offset needed to adjust that
> > timestamp to the current time. This seems consistent with your
> > calculations that showed the 40ns overhead with +/- 10ns precision.
>
> Crazy idea: Instead of trying to time precisely the amount of time we
> spend in each node, configure a very-high frequency timer interrupt
> (or background thread?) that does:
>
> SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 07:20:26
Message-ID: CADbMkNP18FUMCeYeGh-kvZR49Auy2wim54s2eziwYzPN3LbLRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann <
bgrundmann(at)janestreet(dot)com> wrote:

> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available. It's
> all ocaml but well documented and mostly just calls to c functions so
> should be easy to port to C and we release it under a very liberal license
> so it should be no problem to take the ideas:
>
> https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli
>
> Hope this is useful.
>
> Bene
>
>
Also I'm sorry for top posting. hackers is the only mailing list I'm on
that requires this and some others require top posting so this runs counter
my habits and I only realized after sending...

>
> On Wed, May 14, 2014 at 12:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com>wrote:
>
>> On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> > I always assumed the kernel used rdtsc to implement some of the high
>> > performance timers. It can save the current time in a mapped page when
>> > it schedules a process and then in the vdso syscall (ie in user-space)
>> > it can use rdtsc to calculate the offset needed to adjust that
>> > timestamp to the current time. This seems consistent with your
>> > calculations that showed the 40ns overhead with +/- 10ns precision.
>>
>> Crazy idea: Instead of trying to time precisely the amount of time we
>> spend in each node, configure a very-high frequency timer interrupt
>> (or background thread?) that does:
>>
>> SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>


From: Greg Stark <stark(at)mit(dot)edu>
To: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 10:31:24
Message-ID: CAM-w4HO8tFTLJXAWHh0oL+dePXT8yyggBiXif0g-Vwb3wUupWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
<bgrundmann(at)janestreet(dot)com> wrote:
> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available. It's
> all ocaml but well documented and mostly just calls to c functions so should
> be easy to port to C and we release it under a very liberal license so it
> should be no problem to take the ideas:

What OS do you run it on though? How fast is your implementation
compared to the kernel implementation of clock_gettime()?

Are you sure your implementation is actually faster? And are you sure
you're protected against clocks going backwards? I think you should
put some i/o in the loop in the test and start several threads running
it to make it more likely the thread is rescheduled to a different
processor during the test. It suspect you'll find the rdtsc goes
backwards sometimes or produces crazy results when switching
processors.

--
greg


From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 11:04:25
Message-ID: CADbMkNMZWzbrEzUGGocUUijenuEa1upeS71yA7ceE2EXHKzwCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 15, 2014 at 11:31 AM, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
> <bgrundmann(at)janestreet(dot)com> wrote:
> > I posted this on this mailing list before at Jane Street we have
> developed
> > very fast code to get timing information based on TSC if available. It's
> > all ocaml but well documented and mostly just calls to c functions so
> should
> > be easy to port to C and we release it under a very liberal license so it
> > should be no problem to take the ideas:
>
> What OS do you run it on though? How fast is your implementation
> compared to the kernel implementation of clock_gettime()?
>
> Are you sure your implementation is actually faster? And are you sure
> you're protected against clocks going backwards? I think you should
> put some i/o in the loop in the test and start several threads running
> it to make it more likely the thread is rescheduled to a different
> processor during the test. It suspect you'll find the rdtsc goes
> backwards sometimes or produces crazy results when switching
> processors.
>
>
There are benchmarks in the link I posted (obtained by a micro benchmarking
library we developed / use internally which takes great care to obtain
reliable numbers) . We use posix threads extensively. We internally spend
a lot of time setting up ntp and monitoring systems so that clock backwards
never happens (so with other words I wouldn't be surprised if the library
does NOT work correctly when it does -- our protection is outside). I do
not believe we have seen the tdtsc going backwards on thread context switch
you mention (and as said we use lots of threads). OS? Centos 6.5
primarily.

--
> greg
>


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 11:14:39
Message-ID: 20140515111439.GL23943@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-05-15 12:04:25 +0100, Benedikt Grundmann wrote:
> On Thu, May 15, 2014 at 11:31 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>
> > On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
> > <bgrundmann(at)janestreet(dot)com> wrote:
> > > I posted this on this mailing list before at Jane Street we have
> > developed
> > > very fast code to get timing information based on TSC if available. It's
> > > all ocaml but well documented and mostly just calls to c functions so
> > should
> > > be easy to port to C and we release it under a very liberal license so it
> > > should be no problem to take the ideas:
> >
> > What OS do you run it on though? How fast is your implementation
> > compared to the kernel implementation of clock_gettime()?
> >
> > Are you sure your implementation is actually faster? And are you sure
> > you're protected against clocks going backwards? I think you should
> > put some i/o in the loop in the test and start several threads running
> > it to make it more likely the thread is rescheduled to a different
> > processor during the test. It suspect you'll find the rdtsc goes
> > backwards sometimes or produces crazy results when switching
> > processors.
> >
> >
> There are benchmarks in the link I posted (obtained by a micro benchmarking
> library we developed / use internally which takes great care to obtain
> reliable numbers) . We use posix threads extensively. We internally spend
> a lot of time setting up ntp and monitoring systems so that clock backwards
> never happens (so with other words I wouldn't be surprised if the library
> does NOT work correctly when it does -- our protection is outside). I do
> not believe we have seen the tdtsc going backwards on thread context switch
> you mention (and as said we use lots of threads). OS? Centos 6.5
> primarily.

Did you test it on server with more one socket (i.e. not just multiple
cores, but distinct cpu cases)? That's where you expect to see
differences in TSC to have funny effects.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 11:21:08
Message-ID: 20140515112108.GM23943@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-05-13 18:58:11 -0400, Tom Lane wrote:
> Anyway it looks like clock_gettime() might be worth using on Linux
> just for the more precise output. It doesn't seem to exist on OS X
> though, and I have no idea about elsewhere.

Agreed that using clock_gettime() would be a good idea. I'd say we
should have a wrapper around it that is able to provide nanosecond
precision. If only gettimeofday() (and whatever windows is using) is
available, we can dynamically fall back to that.

> I'm curious if anybody has ideas about other things we might do for
> portable high-precision timing.

It's far from a solve-it-all, but can we perhaps try to coalesce
repeated time measurements? We'll very frequently do a
InstrStopNode();
/* minimal amount of work */
InstrStartNode();
which will measure the time twice. I think there's a fair number of
scenarios where once would be enough. I'll freely admit that I haven't
looked enough to determine how we could do that API wise.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2014-05-15 18:56:14
Message-ID: 20140515185614.GG25053@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> A recent question from Tim Kane prompted me to measure the overhead
> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
> are far worse than I thought. On my current server (by no means
> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> at something like 110 nsec per row:

I assume you ran pg_test_timing too:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.70 nsec
Histogram of timing durations:
< usec % of total count
1 95.83035 68935459
2 4.16923 2999133
4 0.00037 268
8 0.00004 31
16 0.00000 1
32 0.00000 1

My overhead of 41.70 nsec matches yours.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Thom Brown <thom(at)linux(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-06-08 11:41:01
Message-ID: CAA-aLv7m=RU0vKm1myt_uRLYmTviqFcfWZCMPwbXdbKoX4YzYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15 May 2014 at 19:56, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> > A recent question from Tim Kane prompted me to measure the overhead
> > costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
> > are far worse than I thought. On my current server (by no means
> > lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> > at something like 110 nsec per row:
>
> I assume you ran pg_test_timing too:
>
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 41.70 nsec
> Histogram of timing durations:
> < usec % of total count
> 1 95.83035 68935459
> 2 4.16923 2999133
> 4 0.00037 268
> 8 0.00004 31
> 16 0.00000 1
> 32 0.00000 1
>
> My overhead of 41.70 nsec matches yours.
>

Did this idea die, or is it still worth considering?

Thom


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-06-08 14:56:23
Message-ID: 20237.1465397783@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> On 15 May 2014 at 19:56, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>> A recent question from Tim Kane prompted me to measure the overhead
>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
>>> are far worse than I thought. On my current server (by no means
>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>> at something like 110 nsec per row:

> Did this idea die, or is it still worth considering?

We still have a problem, for sure. I'm not sure that there was any
consensus on what to do about it. Using clock_gettime(CLOCK_REALTIME)
if available would be a straightforward change that should ameliorate
gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
anything to fix the excessive-overhead problem. The ideas about the
latter were all over the map, and none of them looked easy.

If you're feeling motivated to work on this area, feel free.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-06-14 20:27:07
Message-ID: 41c674c8-e6e4-a706-37b4-1ed097663a9f@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/8/16 9:56 AM, Tom Lane wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> On 15 May 2014 at 19:56, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>>> A recent question from Tim Kane prompted me to measure the overhead
>>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
>>>> are far worse than I thought. On my current server (by no means
>>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>>> at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure. I'm not sure that there was any
> consensus on what to do about it. Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem. The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

Semi-related: someone (Robert I think) recently mentioned investigating
"vectorized" executor nodes, where multiple tuples would be processed in
one shot. If we had that presumably the explain penalty would be a moot
point.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-07-07 18:43:31
Message-ID: CA+TgmoYhXmpUPGv43+N_yantMH5mQJ+hQFHTEHZzakqctZPZ3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 14, 2016 at 4:27 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> Semi-related: someone (Robert I think) recently mentioned investigating
> "vectorized" executor nodes, where multiple tuples would be processed in one
> shot. If we had that presumably the explain penalty would be a moot point.

Yeah, both Andres and I are interested in that, and I think he's
actively working on it. It would be quite neat if this had the effect
of reducing EXPLAIN ANALYZE's overhead to something trivial.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-07-07 18:48:23
Message-ID: 20160707184823.fvgp2tkrpxvjvddz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-07-07 14:43:31 -0400, Robert Haas wrote:
> On Tue, Jun 14, 2016 at 4:27 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> > Semi-related: someone (Robert I think) recently mentioned investigating
> > "vectorized" executor nodes, where multiple tuples would be processed in one
> > shot. If we had that presumably the explain penalty would be a moot point.
>
> Yeah, both Andres and I are interested in that, and I think he's
> actively working on it. It would be quite neat if this had the effect
> of reducing EXPLAIN ANALYZE's overhead to something trivial.

I am, and it does reduce the overhead. Depends on the type of plan
though. Index nestloops e.g. don't benefit on the inner side.

Andres


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-08-25 05:12:18
Message-ID: CAJrrPGd2rA6UoPOxUK1WRgKfxRYB7vQZ357zCv+6kx3ced7v4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 9, 2016 at 12:56 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> On 15 May 2014 at 19:56, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>>> A recent question from Tim Kane prompted me to measure the overhead
>>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
>>>> are far worse than I thought. On my current server (by no means
>>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>>> at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure. I'm not sure that there was any
> consensus on what to do about it. Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem. The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

How about using both CLOCK_REALTIME and CLOCK_REALTIME_COARSE
as the clock id's in clock_gettime wherever applicable. COARSE option is used
wherever there is no timing calculation is required, because in my laptop, there
is a significant performance difference is observed (like 8 times) compared to
CLOCK_REALTIME.

If it is fine, I will try to update the code and send a patch.

Regards,
Hari Babu
Fujitsu Australia


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-09-14 00:53:08
Message-ID: CAJrrPGecetyS5m1F8v4jOUFqy0zGr7hR8kgDMiq1HWeOKB3_oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 25, 2016 at 3:12 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Thu, Jun 9, 2016 at 12:56 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Thom Brown <thom(at)linux(dot)com> writes:
> >> On 15 May 2014 at 19:56, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> >>>> A recent question from Tim Kane prompted me to measure the overhead
> >>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things
> >>>> are far worse than I thought. On my current server (by no means
> >>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> >>>> at something like 110 nsec per row:
> >
> >> Did this idea die, or is it still worth considering?
> >
> > We still have a problem, for sure. I'm not sure that there was any
> > consensus on what to do about it. Using clock_gettime(CLOCK_REALTIME)
> > if available would be a straightforward change that should ameliorate
> > gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> > anything to fix the excessive-overhead problem. The ideas about the
> > latter were all over the map, and none of them looked easy.
> >
> > If you're feeling motivated to work on this area, feel free.
>
> How about using both CLOCK_REALTIME and CLOCK_REALTIME_COARSE
> as the clock id's in clock_gettime wherever applicable. COARSE option is
> used
> wherever there is no timing calculation is required, because in my laptop,
> there
> is a significant performance difference is observed (like 8 times)
> compared to
> CLOCK_REALTIME.
>
> If it is fine, I will try to update the code and send a patch.
>

Attached a patch that replaces most of the getimeofday function calls,
except
timeofday(user callable) and GetCurrentTimestamp functions.

Didn't add any configure checks in case if the clock_gettime function is
not available,
the fallback logic to gettimeofday function call.

Any comments in proceeding further?

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
clock_gettime_1.patch application/octet-stream 33.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-26 21:34:15
Message-ID: 15740.1482788055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> Attached a patch that replaces most of the getimeofday function calls,
> except timeofday(user callable) and GetCurrentTimestamp functions.
> Didn't add any configure checks in case if the clock_gettime function is
> not available, the fallback logic to gettimeofday function call.

Well, of course, configure is the hard part.

I got interested in this area again pursuant to a question from Joel
Jacobson, and looked around to see if things had changed any since
2014. One pleasant surprise is that Apple got around to implementing
clock_gettime() as of the current macOS release (10.12 "Sierra").
That means that pretty much all interesting platforms now have
clock_gettime(), which removes one possible objection to starting to
use it. However, it seems like there is not a lot of commonality
to the best "clockid" to use.

In theory, according to the POSIX spec, CLOCK_MONOTONIC would be what we
want to use for time interval measurement (EXPLAIN ANALYZE), since that
would be impervious to possible clock setting changes. But some
implementations might only optimize the more common CLOCK_REALTIME,
and I found that there are a lot of platform-specific clock IDs that
we might want to consider.

On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
all take about 40ns. Of course gettimeofday() only has 1us resolution,
but the other two have perhaps 10ns resolution (I get no duplicate
readings in a tight loop). Other documented clockids include
CLOCK_REALTIME_COARSE: about 10ns to read, but only 1ms resolution
CLOCK_MONOTONIC_COARSE: about 12ns to read, but only 1ms resolution
CLOCK_MONOTONIC_RAW: full resolution but very slow, ~145ns to read
So CLOCK_MONOTONIC seems to be the thing to use here. It won't buy
us anything speed-wise but the extra resolution will be nice.
However, we need to do more research to see if this holds true on
other popular distros.

On macOS (10.12.2, 2.7GHz x86_64), clock_gettime(CLOCK_REALTIME)
is actually a shade faster than gettimeofday: 40ns versus 46ns.
But it's only giving 1us resolution, no better than gettimeofday.
CLOCK_MONOTONIC is also 1us and it takes 75ns to read. But there's
a CLOCK_MONOTONIC_RAW that takes 44ns to read and seems to offer
full precision -- no duplicate readings in a tight loop. There's
also CLOCK_MONOTONIC_RAW_APPROX which can be read in 23ns but
the resolution is only around half an ms.

I also tried FreeBSD 11.0 on another Mac (2.3GHz x86_64),
and found that gettimeofday as well as basically all their
clock_gettime variants run in 27 to 28 ns; and clock_gettime
reliably delivers full precision, except for CLOCK_SECOND which
is intentionally truncated to 1s precision. So there would be
no need to work with anything but CLOCK_MONOTONIC here.

However, it seems that these impressive results date back only to
June 2012, cf
https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
and at least as of that commit, only x86 and x86_64 had the fast
clock_gettime code. Older FreeBSD, or FreeBSD on another architecture,
is likely to be a lot worse. But I lack an installation to try.

I also tried OpenBSD 6.0 on that same Mac, and got pretty horrid
results: gettimeofday, CLOCK_REALTIME, and CLOCK_MONOTONIC all
take about 613ns to read. Ouch. And so does time(NULL); double
ouch. Evidently there's no optimization on this platform and
what we're seeing is the minimum cost for a kernel call. Still,
we do get better precision from clock_gettime than gettimeofday,
so we might as well switch.

So it seems like the configure support we'd need is to detect
whether clock_gettime is available (note on Linux there's also
a library requirement, -lrt), and we would also need a way to
provide a platform-specific choice of clockid; we at least need
enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-27 01:35:05
Message-ID: CAM-w4HNqDpdJRGHxp9P=RUzmVYc_cSzWEnLDpApdWrvpXkDezA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

So it seems like the configure support we'd need is to detect
whether clock_gettime is available (note on Linux there's also
a library requirement, -lrt), and we would also need a way to
provide a platform-specific choice of clockid; we at least need
enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

This seems like something that really should be checked at runtime. It's
very specific to the specific kernel you're running on, not the build
environment, and it can hopefully be measured in only a second or even a
fraction of a second. The only Pebblebrook would be if other things running
on the system made the test results unpredictable so that you had a small
chance of getting a very suboptimal choice and we ruling the dice each time
you restarted...


From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-27 09:17:37
Message-ID: 20161227091737.5jlcbu4rnqpv2k77@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-12-27 01:35:05 +0000, Greg Stark wrote:
> On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>
> So it seems like the configure support we'd need is to detect
> whether clock_gettime is available (note on Linux there's also
> a library requirement, -lrt), and we would also need a way to
> provide a platform-specific choice of clockid; we at least need
> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.
>
> This seems like something that really should be checked at runtime. It's
> very specific to the specific kernel you're running on, not the build
> environment, and it can hopefully be measured in only a second or even a
> fraction of a second. The only Pebblebrook would be if other things running
> on the system made the test results unpredictable so that you had a small
> chance of getting a very suboptimal choice and we ruling the dice each time
> you restarted...

I'm pretty strongly against doing performance measurements at
startup. Both the delay and the potential for differing test results
seem like pretty bad consequences.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-27 15:50:27
Message-ID: 13814.1482853827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-12-27 01:35:05 +0000, Greg Stark wrote:
>> On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> So it seems like the configure support we'd need is to detect
>>> whether clock_gettime is available (note on Linux there's also
>>> a library requirement, -lrt), and we would also need a way to
>>> provide a platform-specific choice of clockid; we at least need
>>> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

>> This seems like something that really should be checked at runtime.

> I'm pretty strongly against doing performance measurements at
> startup. Both the delay and the potential for differing test results
> seem like pretty bad consequences.

Yeah, that doesn't sound great to me either. And I don't entirely
see the point, at least not with what we know now. I am a bit concerned
that we'll find out there are popular platforms where clock_gettime
compiles but fails with ENOSYS, or some similarly unhelpful behavior.
But we won't find that out if we don't try.

regards, tom lane


From: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-29 20:02:49
Message-ID: 87y3yyfqqe.fsf@mid.deneb.enyo.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane:

> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
> all take about 40ns. Of course gettimeofday() only has 1us resolution,
> but the other two have perhaps 10ns resolution (I get no duplicate
> readings in a tight loop). Other documented clockids include
> CLOCK_REALTIME_COARSE: about 10ns to read, but only 1ms resolution
> CLOCK_MONOTONIC_COARSE: about 12ns to read, but only 1ms resolution
> CLOCK_MONOTONIC_RAW: full resolution but very slow, ~145ns to read
> So CLOCK_MONOTONIC seems to be the thing to use here. It won't buy
> us anything speed-wise but the extra resolution will be nice.
> However, we need to do more research to see if this holds true on
> other popular distros.

Isn't this very specific to kernel and glibc versions, depending on
things like CONFIG_HZ settings and what level of vDSO support has been
backported?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-29 21:02:31
Message-ID: 4130.1483045351@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Weimer <fw(at)deneb(dot)enyo(dot)de> writes:
> * Tom Lane:
>> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
>> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
>> all take about 40ns. Of course gettimeofday() only has 1us resolution,
>> but the other two have perhaps 10ns resolution (I get no duplicate
>> readings in a tight loop).

> Isn't this very specific to kernel and glibc versions, depending on
> things like CONFIG_HZ settings and what level of vDSO support has been
> backported?

No doubt, but I have yet to find a platform where clock_gettime() exists
but performs worse than gettimeofday(). Do you know of one?

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-29 21:05:39
Message-ID: CAEepm=11a1_eYHt4ET_LpgUsuCkzX=dqmCXBkY8ed1=uhZe8mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 27, 2016 at 10:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I also tried FreeBSD 11.0 on another Mac (2.3GHz x86_64),
> and found that gettimeofday as well as basically all their
> clock_gettime variants run in 27 to 28 ns; and clock_gettime
> reliably delivers full precision, except for CLOCK_SECOND which
> is intentionally truncated to 1s precision. So there would be
> no need to work with anything but CLOCK_MONOTONIC here.
>
> However, it seems that these impressive results date back only to
> June 2012, cf
> https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
> and at least as of that commit, only x86 and x86_64 had the fast
> clock_gettime code. Older FreeBSD, or FreeBSD on another architecture,
> is likely to be a lot worse. But I lack an installation to try.

That commit is in every 'production' and 'legacy' release of
FreeBSD[1], meaning as far back as 9.3 (expected to be EoL in the next
few days), because it landed in 9.2 (EoL). ARM support landed in
FreeBSD 11.0[2]. That leaves the following architectures without
fast-path time functions:

macaque:freebsd munro$ git grep 'trivial-vdso_tc.c'
lib/libc/mips/sys/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/powerpc/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/powerpc64/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/powerpcspe/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/riscv/sys/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/sparc64/Makefile.inc:SRCS+= trivial-vdso_tc.c

[1] https://www.freebsd.org/releases/
[2] https://github.com/freebsd/freebsd/commit/80e8626b434515d16b3576174438526755336810

--
Thomas Munro
http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-29 21:18:42
Message-ID: 4685.1483046322@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Tue, Dec 27, 2016 at 10:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> However, it seems that these impressive results date back only to
>> June 2012, cf
>> https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
>> and at least as of that commit, only x86 and x86_64 had the fast
>> clock_gettime code. Older FreeBSD, or FreeBSD on another architecture,
>> is likely to be a lot worse. But I lack an installation to try.

> That commit is in every 'production' and 'legacy' release of
> FreeBSD[1], meaning as far back as 9.3 (expected to be EoL in the next
> few days), because it landed in 9.2 (EoL).

I'm unclear on whether there's any significant number of people running
out-of-support *BSD releases. If it's not something we have to worry
about, fine.

> That leaves the following architectures without
> fast-path time functions:

> macaque:freebsd munro$ git grep 'trivial-vdso_tc.c'
> lib/libc/mips/sys/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/powerpc/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/powerpc64/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/powerpcspe/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/riscv/sys/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/sparc64/Makefile.inc:SRCS+= trivial-vdso_tc.c

Yeah, I just finished getting results from FreeBSD 10.3 on PPC
(1.33GHz G4 laptop): gettimeofday takes about 1180 ns and clock_gettime
about 1200 ns. That difference seems to be repeatable, but since it's
only 2% I'm not too fussed about it. Interestingly, it's very easy
to tell that it is entering the kernel, because time(1) shows a
significant fraction of system time:

$ time ./testclock
0 bogus readings
100000000 distinct readings
117.96 real 26.80 user 90.31 sys

The same test on platforms with vDSO support shows zero system time.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2016-12-30 02:02:39
Message-ID: 17524.1483063359@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> Attached a patch that replaces most of the getimeofday function calls,
> except timeofday(user callable) and GetCurrentTimestamp functions.

I looked at this for awhile and could not convince myself that it's
a good idea. Trying to do s/gettimeofday/clock_gettime/g is not going
to do much for us except create portability headaches. According
to my tests, clock_gettime is not noticeably faster than gettimeofday
on any platform, except that if you use nonstandard clockids like
CLOCK_REALTIME_COARSE then on *some* platforms it's a little bit quicker,
at the cost of being a great deal less precise. But we'd have to research
the existence and effects of nonstandard clockids on every platform.
So AFAICS the only clear advantage to switching is the extra precision
available from clock_gettime.

But ... most of the places you've touched in this patch have neither any
need for sub-microsecond precision nor any great need to worry about
shaving a few ns off the time taken by the call. As far as I can find,
the only place where it's actually worth our trouble to deal with it is
instr_time.h (ie, EXPLAIN ANALYZE and a few other uses).

So I think we should do something more like the attached.

One issue I did not resolve in this WIP patch is what to do with this
gem of abstraction violation in pgbench:

/* no, print raw transactions */
#ifndef WIN32

/* This is more than we really ought to know about instr_time */
if (skipped)
fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
st->id, st->cnt, st->use_file,
(long) now->tv_sec, (long) now->tv_usec);
else
fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
st->id, st->cnt, latency, st->use_file,
(long) now->tv_sec, (long) now->tv_usec);
#else

/* On Windows, instr_time doesn't provide a timestamp anyway */
if (skipped)
fprintf(logfile, "%d " INT64_FORMAT " skipped %d 0 0",
st->id, st->cnt, st->use_file);
else
fprintf(logfile, "%d " INT64_FORMAT " %.0f %d 0 0",
st->id, st->cnt, latency, st->use_file);
#endif

We could either rip out the non-Windows code path entirely, or do
something about providing an honest elapsed-time measurement, perhaps
by doing INSTR_TIME_GET_DOUBLE() on the diff from run start to "now".
Given that we're calling fprintf, I doubt that the extra arithmetic
needed for that is a big problem.

regards, tom lane

Attachment Content-Type Size
clock_gettime_for_EXPLAIN_1.patch text/x-diff 9.6 KB

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2017-01-04 04:00:32
Message-ID: CAJrrPGfDYekY7PMDcQdi2Zj7KpdLO_ZuAVNv16FMbp98EGEeJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 30, 2016 at 1:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> > Attached a patch that replaces most of the getimeofday function calls,
> > except timeofday(user callable) and GetCurrentTimestamp functions.
>
> I looked at this for awhile and could not convince myself that it's
> a good idea. Trying to do s/gettimeofday/clock_gettime/g is not going
> to do much for us except create portability headaches. According
> to my tests, clock_gettime is not noticeably faster than gettimeofday
> on any platform, except that if you use nonstandard clockids like
> CLOCK_REALTIME_COARSE then on *some* platforms it's a little bit quicker,
> at the cost of being a great deal less precise. But we'd have to research
> the existence and effects of nonstandard clockids on every platform.
> So AFAICS the only clear advantage to switching is the extra precision
> available from clock_gettime.
>
> But ... most of the places you've touched in this patch have neither any
> need for sub-microsecond precision nor any great need to worry about
> shaving a few ns off the time taken by the call. As far as I can find,
> the only place where it's actually worth our trouble to deal with it is
> instr_time.h (ie, EXPLAIN ANALYZE and a few other uses).
>
> So I think we should do something more like the attached.
>

Thanks for your valuable input.

As the getimeofday() function is obsolete and any further enhancements
may happen to clock_gettime() function only, because of this reason, I
changed
it many places.

Yes, I agree that until unless the clock_gettime() function that performs
faster
in all platforms compared to gettimeofday(), we can retain the getimeofday()
function.

Regards,
Hari Babu
Fujitsu Australia


From: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2017-01-13 20:48:51
Message-ID: 87wpdy1ya4.fsf@mid.deneb.enyo.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane:

> Florian Weimer <fw(at)deneb(dot)enyo(dot)de> writes:
>> * Tom Lane:
>>> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
>>> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
>>> all take about 40ns. Of course gettimeofday() only has 1us resolution,
>>> but the other two have perhaps 10ns resolution (I get no duplicate
>>> readings in a tight loop).
>
>> Isn't this very specific to kernel and glibc versions, depending on
>> things like CONFIG_HZ settings and what level of vDSO support has been
>> backported?
>
> No doubt, but I have yet to find a platform where clock_gettime() exists
> but performs worse than gettimeofday(). Do you know of one?

ppc64le with all the vDSO fixes for clock_gettime?

glibc has some test cases which fail because clock_gettime gives
inconsistent results. This has been fixed in current kernels, but I
don't know if everyone uses them.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gettimeofday is at the end of its usefulness?
Date: 2017-01-13 20:54:24
Message-ID: 21884.1484340864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Weimer <fw(at)deneb(dot)enyo(dot)de> writes:
> glibc has some test cases which fail because clock_gettime gives
> inconsistent results. This has been fixed in current kernels, but I
> don't know if everyone uses them.

hmm, for which clock IDs?

regards, tom lane