API bug in DetermineTimeZoneOffset()

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: przemek(at)hadapt(dot)com
Subject: API bug in DetermineTimeZoneOffset()
Date: 2013-10-31 16:52:41
Message-ID: 3077.1383238361@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

DetermineTimeZoneOffset thinks that if the passed pg_tz parameter is
equal to session_timezone, it should pay attention to HasCTZSet/CTimeZone
and allow those to override the pg_tz. The folly of this is revealed by
bug #8572, wherein timestamptz input that explicitly specifies a timezone
name is taken to be in the "brute force" zone CTimeZone, if the named zone
is by chance the same zone as the session_timezone that had prevailed
before the "brute force" zone was set. (A "brute force" timezone setting
is a simple numeric offset from GMT, rather than a named zone, which is
looked up in the Olsen database.)

I think that we should change this function to follow the API convention
used by timestamp2tm(), namely that one passes a NULL pointer if one
would like session_timezone/HasCTZSet/CTimeZone to control the result.
A non-null pointer should mean to use that zone specification, period.

This bug is of long standing, so I'm inclined to back-patch the fix.
Now, that's possibly problematic if there are any third-party modules
calling DetermineTimeZoneOffset and passing session_timezone, because
it would mean that they'd stop honoring "brute force" zone settings.
However, I suspect that this feature is practically unused in the field,
else we'd have heard complaints before now. In any case, the possibility
of creating more bugs shouldn't stop us from fixing the bug we've got;
and any other change in DetermineTimeZoneOffset's API would be even
more likely to break third-party modules.

One idea worth thinking about is to set session_timezone to NULL when
HasCTZSet is set true. This would prevent accidental use of an obsoleted
zone setting, and it would also mean that the coding pattern of passing
session_timezone to DetermineTimeZoneOffset would still work and do what
it used to in this scenario. However, it's always been the case up to now
that session_timezone is a valid zone of some sort, and I'm afraid that
there might be code out there that will crash if we set it to NULL.
So I'm inclined not to do this, or at least not in the back branches.

Thoughts?

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-10-31 17:07:17
Message-ID: 20131031170717.GB5809@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> I think that we should change this function to follow the API convention
> used by timestamp2tm(), namely that one passes a NULL pointer if one
> would like session_timezone/HasCTZSet/CTimeZone to control the result.
> A non-null pointer should mean to use that zone specification, period.

It would be most useful if the API of these functions did not rely on
global variables in this way, at least not in GUC variables. (I think
the HasCTZSet/CTimeZone business is not as bad as session_timezone).
Doing that would ease future work to move all this code to src/common/
from where the ecpg implementation could also pick it up. (AFAIR I
recently noticed that there's also frontend code that wants to do
datetime processing, pg_basebackup maybe?). Last I checked, those
global variables were problematic, and the GUC variables were the
hardest to handle of the bunch. So, perhaps, instead of having the code
check session_timezone explicitely, have the caller pass it down.

This consideration probably shouldn't drive a backpatchable fix,
however.

> This bug is of long standing, so I'm inclined to back-patch the fix.
> Now, that's possibly problematic if there are any third-party modules
> calling DetermineTimeZoneOffset and passing session_timezone, because
> it would mean that they'd stop honoring "brute force" zone settings.
> However, I suspect that this feature is practically unused in the field,
> else we'd have heard complaints before now.

Yep, sounds plausible.

Thanks,

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-10-31 17:30:20
Message-ID: 3938.1383240620@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> ... So, perhaps, instead of having the code
> check session_timezone explicitely, have the caller pass it down.

> This consideration probably shouldn't drive a backpatchable fix,
> however.

Well, it's impossible to do that in a back-patchable way, I'm afraid,
because wouldn't you have to pass all three of
session_timezone/HasCTZSet/CTimeZone?

Actually, it strikes me there might be another way to do this, which is to
get rid of HasCTZSet/CTimeZone entirely in favor of consing up some pseudo
pg_tz structure that represents the desired semantics when we want a
"brute force" setting. I think this code is all leftover from a time when
we used the libc timezone routines and did not have a cozy relationship
with the tz representation --- but that's ancient history now. Let me
go look at that idea. It might break external code that looks directly
at HasCTZSet/CTimeZone, but I'll bet there isn't any.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-10-31 18:54:57
Message-ID: 5556.1383245697@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Actually, it strikes me there might be another way to do this, which is to
> get rid of HasCTZSet/CTimeZone entirely in favor of consing up some pseudo
> pg_tz structure that represents the desired semantics when we want a
> "brute force" setting.

After some study of the pgtz code, it turns out this is absolutely trivial
to do by using the POSIX syntax for time zone names. You can do something
like

set timezone to '<GMT+01:30>-01:30';

where the stuff between angle brackets is pretty much arbitrary and is
used as the zone abbreviation for printout purposes. So for example,
after the above I get

regression=# select now(), timeofday();
now | timeofday
----------------------------------+-------------------------------------------
2013-10-31 20:02:54.130828+01:30 | Thu Oct 31 20:02:54.130988 2013 GMT+01:30
(1 row)

(It's worth noting that timeofday() is another operation that doesn't
currently give sane results when HasCTZSet is true --- it prints a time
that matches the brute force zone, with a zone abbreviation that doesn't.)

So what I'm thinking we should do is internally translate SET TIMEZONE
with an interval value into a POSIX-style zone name in the above format,
and then just flush HasCTZSet/CTimeZone and all the special case logic
around them.

This would create a couple of user-visible incompatibilities:

1. The zone abbreviation printed by timeofday(), to_char(), etc would now
become "GMT+-hh[:mm[:ss]]" (or whatever we choose to stick in the angle
brackets, but that's what I'm thinking). However, since those
abbreviation printouts were just completely wrong before, this doesn't
seem like something people could complain about.

2. The value printed by SHOW TIMEZONE would change format. Now you
get

regression=# set time zone '-1.5';
SET
regression=# show time zone;
TimeZone
-----------
-01:30:00
(1 row)

and what I'm proposing is to let it print the POSIX zone name, which
in this case would be <GMT-01:30>+01:30 (note the sign incompatibility
between POSIX and ISO ...). If anybody is sufficiently bothered by this
then we could add a kludge to show_timezone to replicate the old
printout, but I doubt it's a big deal. Again, we know that very darn
few people are using the brute-force zone feature at all (else we'd have
heard complaints sooner), so how many apps are likely to care about the
exact format of SHOW TIME ZONE output for this case?

An intermediate position would be to include the printout kludge in
the back-branch patches and then take it out in HEAD, so that the
change in printout behavior only appears as of 9.4.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-10-31 19:33:53
Message-ID: CA+TgmoYSKeZ_2poo6cDJRjygrMQzkFN9px8HcrPF7bHYgzzVhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 31, 2013 at 2:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Actually, it strikes me there might be another way to do this, which is to
>> get rid of HasCTZSet/CTimeZone entirely in favor of consing up some pseudo
>> pg_tz structure that represents the desired semantics when we want a
>> "brute force" setting.
>
> After some study of the pgtz code, it turns out this is absolutely trivial
> to do by using the POSIX syntax for time zone names. You can do something
> like
>
> set timezone to '<GMT+01:30>-01:30';
>
> where the stuff between angle brackets is pretty much arbitrary and is
> used as the zone abbreviation for printout purposes. So for example,
> after the above I get
>
> regression=# select now(), timeofday();
> now | timeofday
> ----------------------------------+-------------------------------------------
> 2013-10-31 20:02:54.130828+01:30 | Thu Oct 31 20:02:54.130988 2013 GMT+01:30
> (1 row)
>
> (It's worth noting that timeofday() is another operation that doesn't
> currently give sane results when HasCTZSet is true --- it prints a time
> that matches the brute force zone, with a zone abbreviation that doesn't.)
>
> So what I'm thinking we should do is internally translate SET TIMEZONE
> with an interval value into a POSIX-style zone name in the above format,
> and then just flush HasCTZSet/CTimeZone and all the special case logic
> around them.
>
> This would create a couple of user-visible incompatibilities:
>
> 1. The zone abbreviation printed by timeofday(), to_char(), etc would now
> become "GMT+-hh[:mm[:ss]]" (or whatever we choose to stick in the angle
> brackets, but that's what I'm thinking). However, since those
> abbreviation printouts were just completely wrong before, this doesn't
> seem like something people could complain about.
>
> 2. The value printed by SHOW TIMEZONE would change format. Now you
> get
>
> regression=# set time zone '-1.5';
> SET
> regression=# show time zone;
> TimeZone
> -----------
> -01:30:00
> (1 row)
>
> and what I'm proposing is to let it print the POSIX zone name, which
> in this case would be <GMT-01:30>+01:30 (note the sign incompatibility
> between POSIX and ISO ...). If anybody is sufficiently bothered by this
> then we could add a kludge to show_timezone to replicate the old
> printout, but I doubt it's a big deal. Again, we know that very darn
> few people are using the brute-force zone feature at all (else we'd have
> heard complaints sooner), so how many apps are likely to care about the
> exact format of SHOW TIME ZONE output for this case?
>
> An intermediate position would be to include the printout kludge in
> the back-branch patches and then take it out in HEAD, so that the
> change in printout behavior only appears as of 9.4.

I think it's pretty important to avoid user-visible changes in the
back-branches, except to the minimum extent necessary to fix overtly
wrong behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-11-01 03:50:42
Message-ID: 30339.1383277842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> So what I'm thinking we should do is internally translate SET TIMEZONE
> with an interval value into a POSIX-style zone name in the above format,
> and then just flush HasCTZSet/CTimeZone and all the special case logic
> around them.

Attached is a set of proposed patches for this.

> 1. The zone abbreviation printed by timeofday(), to_char(), etc would now
> become "GMT+-hh[:mm[:ss]]" (or whatever we choose to stick in the angle
> brackets, but that's what I'm thinking).

After experimentation it seems that the best idea is to make the displayed
abbreviation be just "+-hh[:mm[:ss]]", using ISO sign convention. This
avoids changes in behavior in places where the code already prints
something reasonable for the timezone name.

The first attached patch just causes session_timezone to point to a pg_tz
struct defined along these lines whenever HasCTZSet is true, plus removing
the very bogus lookaside check in DetermineTimeZoneOffset(). As exhibited
in the added regression tests, this fixes the behavior complained of in
bug #8572. It also makes timeofday() return self-consistent data when a
brute-force timezone is in use. AFAICT it doesn't have any other visible
effects, and it shouldn't break any third-party modules. So I propose
back-patching this to all active branches.

The second attached patch, to be applied after the first, removes the
existing checks of HasCTZSet in the backend. The only visible effect of
this, AFAICT, is that to_char's TZ format spec now delivers something
useful instead of an empty string when a brute-force timezone is in use.
I could be persuaded either way as to whether to back-patch this part.
From one standpoint, this to_char behavioral change is clearly a bug fix
--- but it's barely possible that somebody out there thought that
returning an empty string for TZ was actually the intended/desirable
behavior.

The third patch, to be applied last, nukes HasCTZSet/CTimeZone entirely.
The only consequence I can see at SQL level is that SHOW TIMEZONE will
return a POSIX zone spec instead of an interval value for a brute-force
zone setting (cf change in regression test output). Since the bare
interval value isn't actually something that SET TIMEZONE will accept,
this is clearly a step forward in self-consistency, but it's not something
I would propose to back-patch. In any case we probably don't want to
remove these variables in back branches, on the off chance that some
third-party code is looking at them.

Comments?

regards, tom lane

Attachment Content-Type Size
brute-force-zone-fixes-1.patch text/x-diff 5.2 KB
brute-force-zone-fixes-2.patch text/x-diff 4.9 KB
brute-force-zone-fixes-3.patch text/x-diff 8.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, przemek(at)hadapt(dot)com
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-11-01 14:50:27
Message-ID: 11059.1383317427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> The second attached patch, to be applied after the first, removes the
> existing checks of HasCTZSet in the backend. The only visible effect of
> this, AFAICT, is that to_char's TZ format spec now delivers something
> useful instead of an empty string when a brute-force timezone is in use.
> I could be persuaded either way as to whether to back-patch this part.
> From one standpoint, this to_char behavioral change is clearly a bug fix
> --- but it's barely possible that somebody out there thought that
> returning an empty string for TZ was actually the intended/desirable
> behavior.

Any opinions about whether to back-patch this part or not? It seems
like a bug fix, but on the other hand, I don't recall any complaints
from the field about to_char's TZ spec not working with brute-force zones.
So maybe the prudent thing is to leave well enough alone.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Przemyslaw Pastuszka <przemek(at)hadapt(dot)com>
Subject: Re: API bug in DetermineTimeZoneOffset()
Date: 2013-11-01 19:21:48
Message-ID: CA+TgmoYY-mZHSiE=cTjp0q6Wr_8zUVvtx2QzVPMtuX=6=3U+vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 1, 2013 at 10:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> The second attached patch, to be applied after the first, removes the
>> existing checks of HasCTZSet in the backend. The only visible effect of
>> this, AFAICT, is that to_char's TZ format spec now delivers something
>> useful instead of an empty string when a brute-force timezone is in use.
>> I could be persuaded either way as to whether to back-patch this part.
>> From one standpoint, this to_char behavioral change is clearly a bug fix
>> --- but it's barely possible that somebody out there thought that
>> returning an empty string for TZ was actually the intended/desirable
>> behavior.
>
> Any opinions about whether to back-patch this part or not? It seems
> like a bug fix, but on the other hand, I don't recall any complaints
> from the field about to_char's TZ spec not working with brute-force zones.
> So maybe the prudent thing is to leave well enough alone.

I vote for leaving it alone until somebody complains.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company