droped out precise time calculations in src/interfaces/libpq/fe-connect.c

Lists: pgsql-hackers
From: Denis A Ustimenko <denis(at)oldham(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 03:32:19
Message-ID: 20021014033219.GB966@store.oldham.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

----- Forwarded message from Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> -----

I don't know. Would you ask hackers list, and perhaps CC the author of
that patch.

---------------------------------------------------------------------------

Denis A Ustimenko wrote:
> Bruce, why have all precise time calculations been droped out in 1.206? If there is no
> gettimeofday in win32?
>
----- End forwarded message -----

--
Regards
Denis


From: Joe Conway <mail(at)joeconway(dot)com>
To: Denis A Ustimenko <denis(at)oldham(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 04:02:55
Message-ID: 3DAA41EF.2070604@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Denis A Ustimenko wrote:
>>Bruce, why have all precise time calculations been droped out in 1.206? If there is no
>>gettimeofday in win32?

gettimeofday was not portable to win32 (at least not that I could find) and
hence broke the win32 build of the clients.

Joe


From: Denis A Ustimenko <denis(at)oldham(dot)ru>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 04:48:27
Message-ID: 20021014044827.GA1196@store.oldham.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> Denis A Ustimenko wrote:
> >>Bruce, why have all precise time calculations been droped out in 1.206?
> >>If there is no
> >>gettimeofday in win32?
>
> gettimeofday was not portable to win32 (at least not that I could find) and
> hence broke the win32 build of the clients.
>

GetSystemTimeAsFileTime should help.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp

--
Regards
Denis


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Denis A Ustimenko <denis(at)oldham(dot)ru>
Cc: Joe Conway <mail(at)joeconway(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 05:00:07
Message-ID: 200210140500.g9E507n14974@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Denis A Ustimenko wrote:
> On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > Denis A Ustimenko wrote:
> > >>Bruce, why have all precise time calculations been droped out in 1.206?
> > >>If there is no
> > >>gettimeofday in win32?
> >
> > gettimeofday was not portable to win32 (at least not that I could find) and
> > hence broke the win32 build of the clients.
> >
>
> GetSystemTimeAsFileTime should help.
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp

It's not clear to me how we could get this into something we can deal
with like gettimeofday.

I looked at the Apache APR project, and they have a routine that returns
the microseconds since 1970 for Unix:

/* NB NB NB NB This returns GMT!!!!!!!!!! */
APR_DECLARE(apr_time_t) apr_time_now(void)
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec * APR_USEC_PER_SEC + tv.tv_usec;
}

and for Win32:

APR_DECLARE(apr_time_t) apr_time_now(void)
{
LONGLONG aprtime = 0;
FILETIME time;
#ifndef _WIN32_WCE
GetSystemTimeAsFileTime(&time);
#else
SYSTEMTIME st;
GetSystemTime(&st);
SystemTimeToFileTime(&st, &time);
#endif
FileTimeToAprTime(&aprtime, &time);
return aprtime;
}

and FileTimeToAprTime() is:

/* Number of micro-seconds between the beginning of the Windows epoch
* (Jan. 1, 1601) and the Unix epoch (Jan. 1, 1970)
*/
#define APR_DELTA_EPOCH_IN_USEC APR_TIME_C(11644473600000000);


__inline void FileTimeToAprTime(apr_time_t *result, FILETIME *input)
{
/* Convert FILETIME one 64 bit number so we can work with it. */
*result = input->dwHighDateTime;
*result = (*result) << 32;
*result |= input->dwLowDateTime;
*result /= 10; /* Convert from 100 nano-sec periods to micro-seconds. */
*result -= APR_DELTA_EPOCH_IN_USEC; /* Convert from Windows epoch to Unix epoch */
return;
}

So, this is what needs to be dealt with to get it working.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 05:15:52
Message-ID: 3DAA5308.3040300@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> So, this is what needs to be dealt with to get it working.
>

More to the point, why is sub-second precision needed in this function?
Connection timeout is given to us in whole seconds (1.205 code, i.e. prior to
the patch in question):

remains.tv_sec = atoi(conn->connect_timeout);
if (!remains.tv_sec)
{
conn->status = CONNECTION_BAD;
return 0;
}
remains.tv_usec = 0;
rp = &remains;

So there is no way to bail out prior to one second. Once you accept that the
timeout is >= 1 second, and in whole second increments, why does it need
sub-second resolution?

Joe


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 05:48:41
Message-ID: 200210140548.g9E5mf826940@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway wrote:
> Bruce Momjian wrote:
> > So, this is what needs to be dealt with to get it working.
> >
>
> More to the point, why is sub-second precision needed in this function?
> Connection timeout is given to us in whole seconds (1.205 code, i.e. prior to
> the patch in question):
>
> remains.tv_sec = atoi(conn->connect_timeout);
> if (!remains.tv_sec)
> {
> conn->status = CONNECTION_BAD;
> return 0;
> }
> remains.tv_usec = 0;
> rp = &remains;
>
> So there is no way to bail out prior to one second. Once you accept that the
> timeout is >= 1 second, and in whole second increments, why does it need
> sub-second resolution?

It could be argued that our seconds are not as exact as they could be
with subsecond timing. Not sure it is worth it, but I can see the
point. I would like to remove the tv_usec test because it suggests we
are doing something with microseconds when we are not. Also, should we
switch to a simple time() call, rather than gettimeofday()?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 05:59:40
Message-ID: 3DAA5D4C.5010402@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> It could be argued that our seconds are not as exact as they could be
> with subsecond timing. Not sure it is worth it, but I can see the
> point.

Well, if we were specifying the timeout in microseconds instead of seconds, it
would make sense to have better resolution. But when you can only specify the
timeout in seconds, the internal time comparison doesn't need to be any more
accurate than seconds (IMHO anyway).

> are doing something with microseconds when we are not. Also, should we
> switch to a simple time() call, rather than gettimeofday()?
>

Already done -- that's what Denis is unhappy about.

Joe


From: Denis A Ustimenko <denis(at)oldham(dot)ru>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 06:40:43
Message-ID: 20021014064043.GA538@denis.oldham.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> Well, if we were specifying the timeout in microseconds instead of seconds,
> it would make sense to have better resolution. But when you can only
> specify the timeout in seconds, the internal time comparison doesn't need
> to be any more accurate than seconds (IMHO anyway).

Actually we have the state machine in connectDBComplete() and the timeout is
set for machine as the whole. Therefore if 1 second timeout is seted for the
connectDBComplete() the timeout of particualr iteration of loop can be less
then 1 second.

--
Regards
Denis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Denis A Ustimenko <denis(at)oldham(dot)ru>
Cc: Joe Conway <mail(at)joeconway(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 13:53:27
Message-ID: 11923.1034603607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Denis A Ustimenko <denis(at)oldham(dot)ru> writes:
> On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
>> Well, if we were specifying the timeout in microseconds instead of seconds,
>> it would make sense to have better resolution. But when you can only
>> specify the timeout in seconds, the internal time comparison doesn't need
>> to be any more accurate than seconds (IMHO anyway).

> Actually we have the state machine in connectDBComplete() and the timeout is
> set for machine as the whole. Therefore if 1 second timeout is seted for the
> connectDBComplete() the timeout of particualr iteration of loop can be less
> then 1 second.

However, the code's been restructured so that we don't need to keep
track of the exact time spent in any one iteration. The error is only
on the overall delay. I agree with Joe that it's not worth the effort
needed (in the Win32 case) to make the timeout accurate to < 1 sec.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 14:49:26
Message-ID: 200210141449.g9EEnQi07497@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway wrote:
> Bruce Momjian wrote:
> > It could be argued that our seconds are not as exact as they could be
> > with subsecond timing. Not sure it is worth it, but I can see the
> > point.
>
> Well, if we were specifying the timeout in microseconds instead of seconds, it
> would make sense to have better resolution. But when you can only specify the
> timeout in seconds, the internal time comparison doesn't need to be any more
> accurate than seconds (IMHO anyway).
>
> > are doing something with microseconds when we are not. Also, should we
> > switch to a simple time() call, rather than gettimeofday()?
> >
>
> Already done -- that's what Denis is unhappy about.

OK, I see that, but now, we are stuffing everything into a timeval
struct. Does that make sense? Shouldn't we just use time_t? I realize
we need the timeval struct for select() in pqWaitTimed, but we are
making a copy of the timeval we pass in anyway. Seems it would be easier
just making it a time_t.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 15:19:45
Message-ID: 12656.1034608785@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> Already done -- that's what Denis is unhappy about.

> OK, I see that, but now, we are stuffing everything into a timeval
> struct. Does that make sense? Shouldn't we just use time_t?

Yeah, the code could be simplified now. I'm also still not happy about
the question of whether it's safe to assume tv_sec is signed. I think
the running state should be just finish_time, and then inside the
loop when we are about to wait, we could do

current_time = time(NULL);
if (current_time >= finish_time)
{
// failure exit
}
remains.tv_sec = finish_time - current_time;
remains.tv_usec = 0;
// pass &remains to select...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 15:58:27
Message-ID: 200210141558.g9EFwRw15251@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> Already done -- that's what Denis is unhappy about.
>
> > OK, I see that, but now, we are stuffing everything into a timeval
> > struct. Does that make sense? Shouldn't we just use time_t?
>
> Yeah, the code could be simplified now. I'm also still not happy about
> the question of whether it's safe to assume tv_sec is signed. I think
> the running state should be just finish_time, and then inside the
> loop when we are about to wait, we could do
>
> current_time = time(NULL);
> if (current_time >= finish_time)
> {
> // failure exit
> }
> remains.tv_sec = finish_time - current_time;
> remains.tv_usec = 0;
> // pass &remains to select...

That whole remains structure should be a time_t variable, and then we
_know_ we can't assume it is signed. The use of timeval should
happen only in pqWaitTimed because it has to use select().

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 16:10:15
Message-ID: 13208.1034611815@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> That whole remains structure should be a time_t variable, and then we
> _know_ we can't assume it is signed. The use of timeval should
> happen only in pqWaitTimed because it has to use select().

I think it's fine to use struct timeval as the parameter type for
pqWaitTimed. This particular caller of pqWaitTimed has no need for
sub-second wait precision, but that doesn't mean we might not want it
for other purposes later.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-14 17:14:22
Message-ID: 200210141714.g9EHEM926154@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > That whole remains structure should be a time_t variable, and then we
> > _know_ we can't assume it is signed. The use of timeval should
> > happen only in pqWaitTimed because it has to use select().
>
> I think it's fine to use struct timeval as the parameter type for
> pqWaitTimed. This particular caller of pqWaitTimed has no need for
> sub-second wait precision, but that doesn't mean we might not want it
> for other purposes later.

That was a question: whether pqWaitTimed() was something exported by
libpq and therefore something that has an API that shouldn't change. I
see it in libpq-int.h, which I think means it isn't exported, but yes,
there could be later cases where we need subsecond stuff.

I have applied the following patch to get us a little closer to sanity.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 4.0 KB

From: Denis A Ustimenko <denis(at)oldham(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 03:30:01
Message-ID: 20021015033001.GA1438@denis.oldham.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom, excuse me, I forget to copy previous posting to pgsql-hackers(at)postgresql(dot)org(dot)

On Mon, Oct 14, 2002 at 09:53:27AM -0400, Tom Lane wrote:
> Denis A Ustimenko <denis(at)oldham(dot)ru> writes:
> > On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> >> Well, if we were specifying the timeout in microseconds instead of seconds,
> >> it would make sense to have better resolution. But when you can only
> >> specify the timeout in seconds, the internal time comparison doesn't need
> >> to be any more accurate than seconds (IMHO anyway).
>
> > Actually we have the state machine in connectDBComplete() and the timeout is
> > set for machine as the whole. Therefore if 1 second timeout is seted for the
> > connectDBComplete() the timeout of particualr iteration of loop can be less
> > then 1 second.
>
> However, the code's been restructured so that we don't need to keep
> track of the exact time spent in any one iteration. The error is only
> on the overall delay. I agree with Joe that it's not worth the effort
> needed (in the Win32 case) to make the timeout accurate to < 1 sec.
>

Beware of almost 1 second posiible error. For example: connect_timeout == 1,
we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
do only one iteration. I don't know is it enough to make connection?
True timeout in this case == 0.000001

--
Regards
Denis


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Denis A Ustimenko <denis(at)oldham(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 03:38:55
Message-ID: 200210150338.g9F3ctl16626@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Denis A Ustimenko wrote:
> Tom, excuse me, I forget to copy previous posting to pgsql-hackers(at)postgresql(dot)org(dot)
>
> On Mon, Oct 14, 2002 at 09:53:27AM -0400, Tom Lane wrote:
> > Denis A Ustimenko <denis(at)oldham(dot)ru> writes:
> > > On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> > >> Well, if we were specifying the timeout in microseconds instead of seconds,
> > >> it would make sense to have better resolution. But when you can only
> > >> specify the timeout in seconds, the internal time comparison doesn't need
> > >> to be any more accurate than seconds (IMHO anyway).
> >
> > > Actually we have the state machine in connectDBComplete() and the timeout is
> > > set for machine as the whole. Therefore if 1 second timeout is seted for the
> > > connectDBComplete() the timeout of particualr iteration of loop can be less
> > > then 1 second.
> >
> > However, the code's been restructured so that we don't need to keep
> > track of the exact time spent in any one iteration. The error is only
> > on the overall delay. I agree with Joe that it's not worth the effort
> > needed (in the Win32 case) to make the timeout accurate to < 1 sec.
> >
>
> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
> do only one iteration. I don't know is it enough to make connection?
> True timeout in this case == 0.000001

Good question. What is going to happen is that select() is going to be
passed tv_sec = 1, and it is going to sleep for one second. Now, if
select is interrupted, another time() call is going to be made. If the
second ticked just before we run time(), we are going to think one
second has elapsed and return, so in the non-interrupt case, we are
going to be fine, but with select() interrupted, we are going to have
this problem. Now, if we used gettimeofday(), we would be fine, but we
don't have that on Win32 and the ported version of that looks pretty
complicated. Maybe we will go with what we have and see if anyone
complains.

One idea I am looking at is to pass a time_t into pqWaitTimeout, and do
that clock checking in there, and maybe I can do a gettimeofday() for
non-Win32 and a time() on Win32.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 04:01:16
Message-ID: 11258.1034654476@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Denis A Ustimenko wrote:
>> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
>> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
>> do only one iteration. I don't know is it enough to make connection?
>> True timeout in this case == 0.000001

> Good question. What is going to happen is that select() is going to be
> passed tv_sec = 1, and it is going to sleep for one second. Now, if
> select is interrupted, another time() call is going to be made.

There is a very simple answer to this, which I think I suggested to Joe
originally, but it's not in the code now: the initial calculation of
finish_time = now() + timeout must add one. This ensures that any
roundoff error is in the conservative direction of timing out later,
rather than sooner.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 04:22:33
Message-ID: 200210150422.g9F4MXv20336@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Denis A Ustimenko wrote:
> >> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
> >> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
> >> do only one iteration. I don't know is it enough to make connection?
> >> True timeout in this case == 0.000001
>
> > Good question. What is going to happen is that select() is going to be
> > passed tv_sec = 1, and it is going to sleep for one second. Now, if
> > select is interrupted, another time() call is going to be made.
>
> There is a very simple answer to this, which I think I suggested to Joe
> originally, but it's not in the code now: the initial calculation of
> finish_time = now() + timeout must add one. This ensures that any
> roundoff error is in the conservative direction of timing out later,
> rather than sooner.

Yes, I saw that and will try to add it back into the code.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 16:31:41
Message-ID: 3DAC42ED.3010603@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>>Good question. What is going to happen is that select() is going to be
>>passed tv_sec = 1, and it is going to sleep for one second. Now, if
>>select is interrupted, another time() call is going to be made.
>
> There is a very simple answer to this, which I think I suggested to Joe
> originally, but it's not in the code now: the initial calculation of
> finish_time = now() + timeout must add one. This ensures that any
> roundoff error is in the conservative direction of timing out later,
> rather than sooner.

Yes, my bad, I guess.

The thing was that with the extra +1, I was repeatedly getting a wall-clock
time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1
second timeout automatically turned into 2 seconds every time. With the
current code, I tried a timeout of 1 second at least a 100 times and it always
took about 1 full wall-clock second. But I guess if there is some corner case
that needs it...

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 17:08:05
Message-ID: 16915.1034701685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> The thing was that with the extra +1, I was repeatedly getting a wall-clock
> time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1
> second timeout automatically turned into 2 seconds every time.

That is odd; seems like you should get between 1 and 2 seconds. How
were you measuring the delay, exactly?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 20:13:02
Message-ID: 200210152013.g9FKD2J06510@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
> > The thing was that with the extra +1, I was repeatedly getting a wall-clock
> > time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1
> > second timeout automatically turned into 2 seconds every time.
>
> That is odd; seems like you should get between 1 and 2 seconds. How
> were you measuring the delay, exactly?

Remember, that if you add 1, the select() is going to get tv_sec = 2, so
yes, it will be two seconds.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 20:42:18
Message-ID: 18313.1034714538@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> That is odd; seems like you should get between 1 and 2 seconds. How
>> were you measuring the delay, exactly?

> Remember, that if you add 1, the select() is going to get tv_sec = 2, so
> yes, it will be two seconds.

Yeah, but only if the value isn't recalculated shortly later. Consider

caller computes finish_time = time() + timeout;

...

inside select-wait loop, compute max_delay = finish_time - time();

If the time() value has incremented by 1 second between these two lines
of code, you have a problem with a 1-second timeout...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 21:42:04
Message-ID: 200210152142.g9FLg4915445@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> That is odd; seems like you should get between 1 and 2 seconds. How
> >> were you measuring the delay, exactly?
>
> > Remember, that if you add 1, the select() is going to get tv_sec = 2, so
> > yes, it will be two seconds.
>
> Yeah, but only if the value isn't recalculated shortly later. Consider
>
> caller computes finish_time = time() + timeout;
>
> ...
>
> inside select-wait loop, compute max_delay = finish_time - time();
>
> If the time() value has incremented by 1 second between these two lines
> of code, you have a problem with a 1-second timeout...

Yep. If you track finish time, you get that 1 second rounding problem,
and if you track just duration/timeout, you get into the problem of not
knowing when the timeout has ended. I don't think these can be fixed
except by overestimating (+1) or by tracking subseconds along with
seconds so you really know when one second has elapsed.

Perhaps we need to modify a timeout of 1 to be 2 and leave other values
alone.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-15 23:42:02
Message-ID: 3DACA7CA.70509@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>
>> The thing was that with the extra +1, I was repeatedly getting a
>> wall-clock time of 2 seconds with a timeout set to 1 second. It seemed
>> odd to have my 1 second timeout automatically turned into 2 seconds every
>> time.
>
> That is odd; seems like you should get between 1 and 2 seconds. How were
> you measuring the delay, exactly?

OK. I got a little more scientific about my testing. I used a php script,
running on the same machine, to connect/disconnect in a tight loop and timed
successful and unsuccessful connection attempts using microtime().

Here are the results. First with current cvs code:

current cvs libpq code
-----------------------
good connect info, using unix socket, timeout = 1 second:
=========================================================
unsuccessful 69 times: sum 0.41736388206482: avg 0.0060487519139829
successful 9931 times: sum 68.798981308937: avg 0.0069276992557584

good connect info, using hostaddr, timeout = 1 second
=====================================================
unsuccessful 72 times: sum 0.37020063400269: avg 0.0051416754722595
successful 9928 times: sum 75.047878861427: avg 0.0075592142285886

current cvs libpq code - bad hostaadr, using hostaddr, timeout = 1 second
=========================================================================
unsuccessful 100 times: sum 99.975758910179: avg 0.99975758910179
successful 0 times: sum 0: avg n/a

Clearly not good. The timeout code is causing connection failures about 0.7%
of the time. Next are the results using the attached patch. Per Bruce's
suggestion, it only adds 1 if the timeout is set to 1.

with patch libpq code
---------------------
good connect info, using unix socket, timeout = 1 second
========================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 68.95981669426: avg 0.006895981669426

with patch libpq code - good connect info, using hostaddr, timeout = 1 second
=============================================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 73.500863552094: avg 0.0073500863552094

with patch libpq code - good connect info, using hostaddr, timeout = 2 seconds
==============================================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 73.354710936546: avg 0.0073354710936546

with patch libpq code - bad hostaadr, using hostaddr, timeout = 1 second
========================================================================
unsuccessful 100 times: sum 149.98181843758: avg 1.4998181843758
successful 0 times: sum 0: avg n/a

with patch libpq code - bad hostaadr, using hostaddr, timeout = 2 seconds
=========================================================================
unsuccessful 100 times: sum 149.98445630074: avg 1.4998445630074
successful 0 times: sum 0: avg n/a

with patch libpq code - bad hostaadr, using hostaddr, timeout = 3 seconds
=========================================================================
unsuccessful 20 times: sum 59.842629671097: avg 2.9921314835548
successful 0 times: sum 0: avg n/a

With the patch there were 0 failures on 30000 attempts using good connect
information.

If there are no objections, please apply the attached. Otherwise let me know
if you'd like different tests or would like to try other approaches.

Thanks,

Joe

Attachment Content-Type Size
fe-connect-fix.1.patch text/plain 641 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 02:41:39
Message-ID: 20301.1034736099@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> [ some convincing test cases that timeout=1 is not good ]

> remains.tv_sec = atoi(conn->connect_timeout);
> + if (remains.tv_sec == 1)
> + remains.tv_sec += 1;
> if (!remains.tv_sec)
> {
> conn->status = CONNECTION_BAD;

On pure-paranoia grounds, I'd suggest the logic

+ /* force a sane minimum delay */
+ if (remains.tv_sec < 2)
+ remains.tv_sec = 2;

whereupon you could remove the failure check just below.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 02:54:45
Message-ID: 200210160254.g9G2sjG06658@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
> > [ some convincing test cases that timeout=1 is not good ]
>
> > remains.tv_sec = atoi(conn->connect_timeout);
> > + if (remains.tv_sec == 1)
> > + remains.tv_sec += 1;
> > if (!remains.tv_sec)
> > {
> > conn->status = CONNECTION_BAD;
>
> On pure-paranoia grounds, I'd suggest the logic
>
> + /* force a sane minimum delay */
> + if (remains.tv_sec < 2)
> + remains.tv_sec = 2;
>
> whereupon you could remove the failure check just below.

I think we should fail if they set the timeout to zero, rather than
cover it up by setting the delay to two.

Attached is a patch that implements most of what we have discussed:

use time()
get rid of timeval where not needed
allow restart of select() to properly compute remaining time
add 1 to timeout == 1
pass finish time to pqWaitTimed

Patch applied. I am applying it so it is in CVS and everyone can see
it. I will keep modifying it until everyone likes it. It is just
easier to do it that way when multiple people are reviewing it. They
can jump in and make changes too.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 6.0 KB

From: Joe Conway <mail(at)joeconway(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 05:15:27
Message-ID: 3DACF5EF.80602@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Patch applied. I am applying it so it is in CVS and everyone can see
> it. I will keep modifying it until everyone likes it. It is just
> easier to do it that way when multiple people are reviewing it. They
> can jump in and make changes too.

I ran the same test as before, with the following results:

current cvs
-----------
good connect info, using hostaddr, timeout = 1 second
=====================================================
unsuccessful 0 times: avg n/a
successful 50000 times: avg 0.0087

bad connect info, using hostaddr, timeout = 1 second
====================================================
unsuccessful 100 times: avg 1.4998
successful 0 times: sum 0: avg n/a

Seems to work well. But one slight concern:

with previous 2 line patch
--------------------------
good connect info, using hostaddr, timeout = 1 || 2 second(s)
=============================================================
unsuccessful 0 times: avg n/a
successful 20000 times: avg 0.0074

These tests were on the same, otherwise unloaded development box. Not sure if
it is an artifact or not, but the average connection time has gone from 0.0074
to 0.0087, an increase of about 17%. Also worth noting is that there was very
little deviation from connect-to-connect on both of the tests (eye-balled the
total range at about 0.0003). I did not bother calculating standard deviation
of the connect times, but I'm certain it would not be enough to account for
the difference. Could anything in Bruce's patch account for this, or do you
think it is normal variation due to something on my dev box?

Joe


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 05:23:44
Message-ID: 200210160523.g9G5NiS05055@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway wrote:
> Seems to work well. But one slight concern:
>
> with previous 2 line patch
> --------------------------
> good connect info, using hostaddr, timeout = 1 || 2 second(s)
> =============================================================
> unsuccessful 0 times: avg n/a
> successful 20000 times: avg 0.0074
>
> These tests were on the same, otherwise unloaded development box. Not sure if
> it is an artifact or not, but the average connection time has gone from 0.0074
> to 0.0087, an increase of about 17%. Also worth noting is that there was very
> little deviation from connect-to-connect on both of the tests (eye-balled the
> total range at about 0.0003). I did not bother calculating standard deviation
> of the connect times, but I'm certain it would not be enough to account for
> the difference. Could anything in Bruce's patch account for this, or do you
> think it is normal variation due to something on my dev box?

Yes, the new code has _three_ time() calls, rather than the old code
that I think only had two. I was going to mention it but I figured
time() was a pretty light system call, sort of like getpid().

I needed the additional time() calls so the computation of remaining
time was more accurate, i.e. we are not resetting the timer on a
select() EINTR anymore.

Should I try to rework it?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 05:35:58
Message-ID: 3DACFABE.5060005@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Yes, the new code has _three_ time() calls, rather than the old code
> that I think only had two. I was going to mention it but I figured
> time() was a pretty light system call, sort of like getpid().
>
> I needed the additional time() calls so the computation of remaining
> time was more accurate, i.e. we are not resetting the timer on a
> select() EINTR anymore.
>
> Should I try to rework it?
>

I tried two more runs of 10000, and the average is pretty steady at 0.0087.
However the total range is a fair bit wider than I originally reported.

I added a forth time() call to see what the effect would be. It increased the
average to 0.0089 (two runs of 10000 connects each), so I don't think the
time() call explains the entire difference.

Not sure this is worth worrying about or not. I'd guess anyone serious about
keeping connect time to a minimum uses some kind of connection pool or
persistent connection anyway.

Joe


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 05:40:41
Message-ID: 200210160540.g9G5efA07216@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway wrote:
> Bruce Momjian wrote:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two. I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> >
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
> >
> > Should I try to rework it?
> >
>
> I tried two more runs of 10000, and the average is pretty steady at 0.0087.
> However the total range is a fair bit wider than I originally reported.
>
> I added a forth time() call to see what the effect would be. It increased the
> average to 0.0089 (two runs of 10000 connects each), so I don't think the
> time() call explains the entire difference.
>
> Not sure this is worth worrying about or not. I'd guess anyone serious about
> keeping connect time to a minimum uses some kind of connection pool or
> persistent connection anyway.

Well, the fact you see a change of 0.0002 is significant. Let me add
that in the old code there was only one time() call _in_ the loop, while
now, there are two, so I can easily see there are several additional
time() calls. Did you put your calls in the while loop?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Joe Conway <mail(at)joeconway(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 06:00:02
Message-ID: 3DAD0062.2040400@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Well, the fact you see a change of 0.0002 is significant. Let me add
> that in the old code there was only one time() call _in_ the loop, while
> now, there are two, so I can easily see there are several additional
> time() calls. Did you put your calls in the while loop?
>

Not the first time, but I moved it around (into the loop in connectDBComplete,
and just before select() in pqWaitTimed) and it didn't seem to make any
difference. Then I removed it entirely again, and still got 0.0089 seconds
average. So, at the least, 0.0002 worth of variation seems to be related to
the development machine itself.

Joe

p.s. The good news is that with tens of thousands more tests at 1 second
timeout, still zero connection failures!


From: Denis A Ustimenko <denis(at)oldham(dot)ru>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 09:50:47
Message-ID: 20021016095047.GA5053@store.oldham.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 14, 2002 at 01:00:07AM -0400, Bruce Momjian wrote:
> Denis A Ustimenko wrote:
> > On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > > Denis A Ustimenko wrote:
> > > >>Bruce, why have all precise time calculations been droped out in 1.206?
> > > >>If there is no
> > > >>gettimeofday in win32?
> > >
> > > gettimeofday was not portable to win32 (at least not that I could find) and
> > > hence broke the win32 build of the clients.
> > >
> >
> > GetSystemTimeAsFileTime should help.
> >
> > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp
>
> It's not clear to me how we could get this into something we can deal
> with like gettimeofday.
>
> I looked at the Apache APR project, and they have a routine that returns
> the microseconds since 1970 for Unix:
>

Here is my version of gettimeofday for win32. It was tested with Watcom C 11.0c. I think it can be used. I still belive that fine time calculation is the right way.

#include<stdio.h>
#ifdef _WIN32
#include<winsock.h>
#else
#include<sys/time.h>
#endif

main()
{
struct timeval t;
if (gettimeofday(&t,NULL)) {
printf("error\n\r");
} else {
printf("the time is %ld.%ld\n\r", t.tv_sec, t.tv_usec);
}
fflush(stdout);
}

#ifdef _WIN32
int gettimeofday(struct timeval *tp, void *tzp)
{
FILETIME time;
__int64 tmp;

if ( NULL == tp) return -1;

GetSystemTimeAsFileTime(&time);

tmp = time.dwHighDateTime;
tmp <<= 32;
tmp |= time.dwLowDateTime;
tmp /= 10; // it was in 100 nanosecond periods
tp->tv_sec = tmp / 1000000 - 11644473600L; // Windows Epoch begins at 12:00 AM 01.01.1601
tp->tv_usec = tmp % 1000000;
return 0;
}
#endif

--
Regards
Denis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 14:03:35
Message-ID: 23946.1034777015@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Yes, the new code has _three_ time() calls, rather than the old code
> that I think only had two. I was going to mention it but I figured
> time() was a pretty light system call, sort of like getpid().
> I needed the additional time() calls so the computation of remaining
> time was more accurate, i.e. we are not resetting the timer on a
> select() EINTR anymore.

As long as the time() calls aren't invoked in the default no-timeout
case, I doubt that the small additional slowdown matters too much.
Still, one could ask why we are expending extra cycles to make the
timeout more accurate. Who the heck needs an accurate timeout on
connect? Can you really give a use-case where the user won't have
picked a number out of the air anyway?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 14:56:31
Message-ID: 200210161456.g9GEuWm24792@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two. I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
>
> As long as the time() calls aren't invoked in the default no-timeout
> case, I doubt that the small additional slowdown matters too much.
> Still, one could ask why we are expending extra cycles to make the
> timeout more accurate. Who the heck needs an accurate timeout on
> connect? Can you really give a use-case where the user won't have
> picked a number out of the air anyway?

I think we do need to properly compute the timeout on an EINTR of
select() because if we don't, a 30 second timeout could become 90
seconds if select() is interrupted. The other time() calls are needed,
one above the loop, and one inside the loop.

The only thing I can do is to pass in the end time _and_ the duration to
pqWaitTimeout and do the time() recomputation only on EINTR. This would
compute duration in the loop where I call time() and therefore elminate
a time() call in the normal, non-EINTR case. Of course, it makes the
code more complicated, and that is why I avoided it.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 15:01:33
Message-ID: 200210161501.g9GF1Xs25718@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two. I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
>
> As long as the time() calls aren't invoked in the default no-timeout
> case, I doubt that the small additional slowdown matters too much.
> Still, one could ask why we are expending extra cycles to make the
> timeout more accurate. Who the heck needs an accurate timeout on
> connect? Can you really give a use-case where the user won't have
> picked a number out of the air anyway?

Yes, the default no-timeout case makes no time() calls.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 15:02:05
Message-ID: 24412.1034780525@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> Still, one could ask why we are expending extra cycles to make the
>> timeout more accurate. Who the heck needs an accurate timeout on
>> connect? Can you really give a use-case where the user won't have
>> picked a number out of the air anyway?

> I think we do need to properly compute the timeout on an EINTR of
> select() because if we don't, a 30 second timeout could become 90
> seconds if select() is interrupted. The other time() calls are needed,
> one above the loop, and one inside the loop.

AFAICS we need one time() call at the start, and then one inside the
select loop. I haven't looked at your recent patches, but you said
something about putting two calls in the loop; that seems like overkill.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Denis A Ustimenko <denis(at)oldham(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 15:02:10
Message-ID: 200210161502.g9GF2Ac25831@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I will keep this in case we need it later. I think we worked around
this problem by having timeout == 1 set equal to 2 so we get at least
one second for the connection.

---------------------------------------------------------------------------

Denis A Ustimenko wrote:
> On Mon, Oct 14, 2002 at 01:00:07AM -0400, Bruce Momjian wrote:
> > Denis A Ustimenko wrote:
> > > On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > > > Denis A Ustimenko wrote:
> > > > >>Bruce, why have all precise time calculations been droped out in 1.206?
> > > > >>If there is no
> > > > >>gettimeofday in win32?
> > > >
> > > > gettimeofday was not portable to win32 (at least not that I could find) and
> > > > hence broke the win32 build of the clients.
> > > >
> > >
> > > GetSystemTimeAsFileTime should help.
> > >
> > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp
> >
> > It's not clear to me how we could get this into something we can deal
> > with like gettimeofday.
> >
> > I looked at the Apache APR project, and they have a routine that returns
> > the microseconds since 1970 for Unix:
> >
>
> Here is my version of gettimeofday for win32. It was tested with Watcom C 11.0c. I think it can be used. I still belive that fine time calculation is the right way.
>
> #include<stdio.h>
> #ifdef _WIN32
> #include<winsock.h>
> #else
> #include<sys/time.h>
> #endif
>
> main()
> {
> struct timeval t;
> if (gettimeofday(&t,NULL)) {
> printf("error\n\r");
> } else {
> printf("the time is %ld.%ld\n\r", t.tv_sec, t.tv_usec);
> }
> fflush(stdout);
> }
>
> #ifdef _WIN32
> int gettimeofday(struct timeval *tp, void *tzp)
> {
> FILETIME time;
> __int64 tmp;
>
> if ( NULL == tp) return -1;
>
> GetSystemTimeAsFileTime(&time);
>
> tmp = time.dwHighDateTime;
> tmp <<= 32;
> tmp |= time.dwLowDateTime;
> tmp /= 10; // it was in 100 nanosecond periods
> tp->tv_sec = tmp / 1000000 - 11644473600L; // Windows Epoch begins at 12:00 AM 01.01.1601
> tp->tv_usec = tmp % 1000000;
> return 0;
> }
> #endif
>
>
>
> --
> Regards
> Denis
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Denis A Ustimenko <denis(at)oldham(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c
Date: 2002-10-16 15:05:40
Message-ID: 200210161505.g9GF5eZ26278@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> Still, one could ask why we are expending extra cycles to make the
> >> timeout more accurate. Who the heck needs an accurate timeout on
> >> connect? Can you really give a use-case where the user won't have
> >> picked a number out of the air anyway?
>
> > I think we do need to properly compute the timeout on an EINTR of
> > select() because if we don't, a 30 second timeout could become 90
> > seconds if select() is interrupted. The other time() calls are needed,
> > one above the loop, and one inside the loop.
>
> AFAICS we need one time() call at the start, and then one inside the
> select loop. I haven't looked at your recent patches, but you said
> something about putting two calls in the loop; that seems like overkill.

Yes, one call at the start, and one in the loop. We need another in
pqWaitTimeout, but only if we hit EINTR. As the code stands now we do
time() unconditionally in pqWaitTimeout too because we only have to pass
in the funish time. If we want to pass in both finish and duration (for
use as select() timeout param), then we can eliminate the time() call in
there for the non-EINTR case. Is it worth the added code complexity?
That is what I am not sure about.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073