Re: Proposal for better support of time-varying timezone abbreviations

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-05 21:33:20
Message-ID: 27083.1412544800@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I got interested in the problem discussed in
http://www.postgresql.org/message-id/20714.1412456604@sss.pgh.pa.us
to wit:

> It's becoming clear to me that our existing design whereby zone
> abbreviations represent fixed GMT offsets isn't really good enough.
> I've been wondering whether we could change things so that, for instance,
> "EDT" means "daylight time according to America/New_York" and the system
> would consult the zic database to find out what the prevailing GMT offset
> was in that zone on that date. This would be a lot more robust in the
> face of the kind of foolishness we now see actually goes on.

Here is a fairly detailed design sketch for a solution:

1. Allow tznames entries to consist of an abbreviation and the name of
a zic timezone, for example

MSK Europe/Moscow

instead of the current scheme whereby an abbreviation is defined by a
daylight-savings flag and a numeric GMT offset. When an abbreviation is
defined this way, the implied offset and DST flag are looked up
dynamically as described below. (In my message quoted above, I'd imagined
that we'd write a DST flag and a zone name, but it turns out this does not
work because there are cases where the DST property has changed over time.
Yes, really. So this design mandates that we derive the DST flag by
looking into the zic timezone data.) Note that we'll still allow the old
style of entries, and indeed prefer that way for cases where an
abbreviation has never changed meaning, because:

* We need that anyway for forwards compatibility of existing custom
abbreviations files.

* It's a lot cheaper to interpret a fixed-meaning zone abbreviation using
the existing logic than to do it as I propose here, so we shouldn't spend
the extra cycles unless necessary.

* Converting every one of the existing abbreviation-file entries would be
really tedious, so I don't want to do it where not necessary.

Also note that this doesn't touch the aspect of the existing design
whereby there are multiple potential abbreviations files. We still have
the problem that the same abbreviation can be in use in different
timezones, so we have to let users configure which zone they mean by a
given abbreviation.

2. To interpret such an abbreviation in the context of timestamptz input,
look up the referenced zic timezone, and use the meaning of the
abbreviation that prevailed at or most recently before the local time
indicated by the rest of the timestamptz string. If the abbreviation was
never used before that time in the given zone, use its earliest later
interpretation; or if it was never used at all (ie bad configuration file)
throw error. Note that this is different from what happens if you give
the underlying zone name directly. It's always been the case that you
could say, for instance, "EST" to force interpretation of a datetime as
standard time even when DST is in force, or "EDT" to force the opposite
interpretation, and this definition preserves that behavior.

3. In the context of timetz input, we only have a time of day not a full
datetime to look at, so it's not entirely clear what to do. We could
throw an error, but that would result in rejecting some inputs currently
considered valid. Perhaps we don't really care, since we consider timetz
a deprecated type anyway. If that doesn't seem OK, we could assume
today's date and the given time-of-day and look up the abbreviation's
meaning as described above. This would mean that the meaning of, say,
'15:00 MSK'::timetz would change over time --- but that happens now,
whenever we change the contents of the abbreviations file entry for MSK,
so maybe this isn't as horrid as it sounds.

4. I've eyeballed the relevant code a bit, and it seems that the only
implementation aspect that isn't perfectly straightforward is figuring
out how to cram a zic timezone reference into a datetkn table entry.
I suggest that before tackling this feature proper, we bring struct
datetkn into the 21st century by widening it from 12 to 16 bytes, along
the lines of

typedef struct
{
char token[TOKMAXLEN + 1]; /* now always null-terminated */
char type;
int32 value;
} datetkn;

and getting rid of all of the very crufty code that deals with
non-null-terminated token strings and cramming values that don't really
fit into a char-sized field into "value". (We might save more code bytes
that way than we spend on the wider token-table entries :-( ... and we'll
certainly make the code less ugly.) Having done that, the "value" can be
large enough to be an index into additional storage appended to a
TimeZoneAbbrevTable. I imagine it pointing at a struct like this:

struct DynamicTimeZoneAbbrev
{
const pg_tz *tz; /* zic timezone, or NULL if not yet looked up */
char name[1]; /* zone name (variable length string)
};

We'd resolve the timezone name into a pg_tz pointer only upon first use of
a dynamic abbreviation, since we don't want to force loading of every zone
referenced in the configuration file at startup; many sessions wouldn't
ever use them.

(I also considered just allowing struct datetkn to contain a pointer; but
adding a union would make initialization of constant datetkn arrays more
notationally painful, and perhaps impossible with older C compilers.)

5. It's worth debating whether we should back-patch such a change.
It certainly is a feature addition, and as such not something we'd
normally consider back-patching, but:

* Those time-varying zone abbreviations are out there whether we like
it or not. As Bruce noted in the other thread, this is going to be
a pain point for a lot of people, particularly in Russia.

* Our maintenance processes for the timezone data files assume that we
can back-patch the same change into all active branches. It'll be a lot
more tedious and error-prone if we can only use this feature in the most
recent branches.

So I'm inclined to propose not merely doing this, but back-patching
into all supported branches. I can see that there might be consensus
against that though.

Thoughts, objections, better ideas?

regards, tom lane


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-05 22:29:54
Message-ID: 5431C662.4010402@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/10/14 10:33, Tom Lane wrote:
> I got interested in the problem discussed in
> http://www.postgresql.org/message-id/20714.1412456604@sss.pgh.pa.us
> to wit:
>
>> It's becoming clear to me that our existing design whereby zone
>> abbreviations represent fixed GMT offsets isn't really good enough.
>> I've been wondering whether we could change things so that, for instance,
>> "EDT" means "daylight time according to America/New_York" and the system
>> would consult the zic database to find out what the prevailing GMT offset
>> was in that zone on that date. This would be a lot more robust in the
>> face of the kind of foolishness we now see actually goes on.
> Here is a fairly detailed design sketch for a solution:
>
> 1. Allow tznames entries to consist of an abbreviation and the name of
> a zic timezone, for example
>
> MSK Europe/Moscow
>
> instead of the current scheme whereby an abbreviation is defined by a
> daylight-savings flag and a numeric GMT offset. When an abbreviation is
> defined this way, the implied offset and DST flag are looked up
> dynamically as described below. (In my message quoted above, I'd imagined
> that we'd write a DST flag and a zone name, but it turns out this does not
> work because there are cases where the DST property has changed over time.
> Yes, really. So this design mandates that we derive the DST flag by
> looking into the zic timezone data.) Note that we'll still allow the old
> style of entries, and indeed prefer that way for cases where an
> abbreviation has never changed meaning, because:
>
> * We need that anyway for forwards compatibility of existing custom
> abbreviations files.
>
> * It's a lot cheaper to interpret a fixed-meaning zone abbreviation using
> the existing logic than to do it as I propose here, so we shouldn't spend
> the extra cycles unless necessary.
>
> * Converting every one of the existing abbreviation-file entries would be
> really tedious, so I don't want to do it where not necessary.
>
> Also note that this doesn't touch the aspect of the existing design
> whereby there are multiple potential abbreviations files. We still have
> the problem that the same abbreviation can be in use in different
> timezones, so we have to let users configure which zone they mean by a
> given abbreviation.
>
> 2. To interpret such an abbreviation in the context of timestamptz input,
> look up the referenced zic timezone, and use the meaning of the
> abbreviation that prevailed at or most recently before the local time
> indicated by the rest of the timestamptz string. If the abbreviation was
> never used before that time in the given zone, use its earliest later
> interpretation; or if it was never used at all (ie bad configuration file)
> throw error. Note that this is different from what happens if you give
> the underlying zone name directly. It's always been the case that you
> could say, for instance, "EST" to force interpretation of a datetime as
> standard time even when DST is in force, or "EDT" to force the opposite
> interpretation, and this definition preserves that behavior.
>
> 3. In the context of timetz input, we only have a time of day not a full
> datetime to look at, so it's not entirely clear what to do. We could
> throw an error, but that would result in rejecting some inputs currently
> considered valid. Perhaps we don't really care, since we consider timetz
> a deprecated type anyway. If that doesn't seem OK, we could assume
> today's date and the given time-of-day and look up the abbreviation's
> meaning as described above. This would mean that the meaning of, say,
> '15:00 MSK'::timetz would change over time --- but that happens now,
> whenever we change the contents of the abbreviations file entry for MSK,
> so maybe this isn't as horrid as it sounds.
>
> 4. I've eyeballed the relevant code a bit, and it seems that the only
> implementation aspect that isn't perfectly straightforward is figuring
> out how to cram a zic timezone reference into a datetkn table entry.
> I suggest that before tackling this feature proper, we bring struct
> datetkn into the 21st century by widening it from 12 to 16 bytes, along
> the lines of
>
> typedef struct
> {
> char token[TOKMAXLEN + 1]; /* now always null-terminated */
> char type;
> int32 value;
> } datetkn;
>
> and getting rid of all of the very crufty code that deals with
> non-null-terminated token strings and cramming values that don't really
> fit into a char-sized field into "value". (We might save more code bytes
> that way than we spend on the wider token-table entries :-( ... and we'll
> certainly make the code less ugly.) Having done that, the "value" can be
> large enough to be an index into additional storage appended to a
> TimeZoneAbbrevTable. I imagine it pointing at a struct like this:
>
> struct DynamicTimeZoneAbbrev
> {
> const pg_tz *tz; /* zic timezone, or NULL if not yet looked up */
> char name[1]; /* zone name (variable length string)
> };
>
> We'd resolve the timezone name into a pg_tz pointer only upon first use of
> a dynamic abbreviation, since we don't want to force loading of every zone
> referenced in the configuration file at startup; many sessions wouldn't
> ever use them.
>
> (I also considered just allowing struct datetkn to contain a pointer; but
> adding a union would make initialization of constant datetkn arrays more
> notationally painful, and perhaps impossible with older C compilers.)
>
> 5. It's worth debating whether we should back-patch such a change.
> It certainly is a feature addition, and as such not something we'd
> normally consider back-patching, but:
>
> * Those time-varying zone abbreviations are out there whether we like
> it or not. As Bruce noted in the other thread, this is going to be
> a pain point for a lot of people, particularly in Russia.
>
> * Our maintenance processes for the timezone data files assume that we
> can back-patch the same change into all active branches. It'll be a lot
> more tedious and error-prone if we can only use this feature in the most
> recent branches.
>
> So I'm inclined to propose not merely doing this, but back-patching
> into all supported branches. I can see that there might be consensus
> against that though.
>
> Thoughts, objections, better ideas?
>
> regards, tom lane
>
>
What I am going to discuss may be way too complicated to implement (or
impractical for other reasons!), Ibut I feel that I should at least
mention it - because it might (does?) address real problems (I've been
bitten by this kind of problem in the past).

In a totally different context relating to insurance quotes, I devised a
scheme to use both an /effective_date/ & an /as_at_date/. How these
concepts might be implemented in pg, in this instance, is likely to be
very different from what I did originally.

I have not checked, but I suspect that pg probably already uses an
/effective_date/ to control when changes to daylight saving date/time's
come into affect (such as a change in the date that the transition to
daylight saving takes effect). If not, then maybe this should be
considered. This could also be used, if it was desired to use the
appropriate abbreviation and offset valid at date/time where it was
different to that defined at the current date/time.

The use of an /as_at_date/ is far more problematic. The idea relates to
how existing date/times should be treated with respect to the date/time
that a pg database is updated with new time zone data files. In the
simplest form: there would be a function in pg that would return the
date/time a new time zone data file was entered into the system, so that
application software can manually correct when the stored GMT date/time
was stored incorrectly because the wring GMT offset was used due to the
updated time zone data files not being in place. Alternatively, pg
could offer to do the correction in a one-off action at the time the new
zone data files were updated.

Cheers,
Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-05 22:42:03
Message-ID: 28563.1412548923@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
> The use of an /as_at_date/ is far more problematic. The idea relates to
> how existing date/times should be treated with respect to the date/time
> that a pg database is updated with new time zone data files. In the
> simplest form: there would be a function in pg that would return the
> date/time a new time zone data file was entered into the system, so that
> application software can manually correct when the stored GMT date/time
> was stored incorrectly because the wring GMT offset was used due to the
> updated time zone data files not being in place. Alternatively, pg
> could offer to do the correction in a one-off action at the time the new
> zone data files were updated.

Right now there's basically no way to do something like that, since what
we store for timestamptz is just a UTC time instant, with no record of
what GMT offset was involved much less exactly how the offset was
specified in the input. We'd probably have to (at least) double the
on-disk size of timestamptz values to record that ... which seems like a
mighty high price to pay to fix a corner case. Not to mention that
nobody's going to be willing to break on-disk compatibility of timestamptz
for this.

In any case, my proposal is just about being able to correctly interpret
historical timezone abbreviations during input, not about changing what
we store as datetime values.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-06 23:19:34
Message-ID: 54332386.7050606@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/5/14, 5:42 PM, Tom Lane wrote:
> Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
>> The use of an /as_at_date/ is far more problematic. The idea relates to
>> how existing date/times should be treated with respect to the date/time
>> that a pg database is updated with new time zone data files. In the
>> simplest form: there would be a function in pg that would return the
>> date/time a new time zone data file was entered into the system, so that
>> application software can manually correct when the stored GMT date/time
>> was stored incorrectly because the wring GMT offset was used due to the
>> updated time zone data files not being in place. Alternatively, pg
>> could offer to do the correction in a one-off action at the time the new
>> zone data files were updated.
>
> Right now there's basically no way to do something like that, since what
> we store for timestamptz is just a UTC time instant, with no record of
> what GMT offset was involved much less exactly how the offset was
> specified in the input. We'd probably have to (at least) double the
> on-disk size of timestamptz values to record that ... which seems like a
> mighty high price to pay to fix a corner case. Not to mention that
> nobody's going to be willing to break on-disk compatibility of timestamptz
> for this.

FWIW, I agree for timestamptz, but I do wish we had a timestamp datatype that stored the exact timezone in effect when the data was entered. That can really, REALLY save your rear if you screw up either timezone in postgresql.conf, or the server's timezone. The part that seems hard (at least to me) is the question of how to actually store the timezone, because I don't think storing the text string "America/Central" is going to cut it. :/
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: <mtweber(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-07 16:48:34
Message-ID: 54341962.7030202@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/6/14, 6:19 PM, Jim Nasby wrote:
> FWIW, I agree for timestamptz, but I do wish we had a timestamp datatype that stored the exact timezone in effect when the data was entered. That can really, REALLY save your rear if you screw up either timezone in postgresql.conf, or the server's timezone. The part that seems hard (at least to me) is the question of how to actually store the timezone, because I don't think storing the text string "America/Central" is going to cut it. :/

For the archives... there's an extension that does what I'd been talking about: http://pgxn.org/dist/timestampandtz/.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-07 22:05:20
Message-ID: 28124.1412719520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> 4. I've eyeballed the relevant code a bit, and it seems that the only
> implementation aspect that isn't perfectly straightforward is figuring
> out how to cram a zic timezone reference into a datetkn table entry.
> I suggest that before tackling this feature proper, we bring struct
> datetkn into the 21st century by widening it from 12 to 16 bytes, along
> the lines of

> typedef struct
> {
> char token[TOKMAXLEN + 1]; /* now always null-terminated */
> char type;
> int32 value;
> } datetkn;

> and getting rid of all of the very crufty code that deals with
> non-null-terminated token strings and cramming values that don't really
> fit into a char-sized field into "value". (We might save more code bytes
> that way than we spend on the wider token-table entries :-( ... and we'll
> certainly make the code less ugly.)

Attached is a proposed patch for this part. It turned out to be quite
a bit more exciting (not in a good way) than I'd expected. Somewhere
along the way, somebody decided that the easiest way to get from point A
to point B was to insert a minus sign into the FROMVAL() macro, meaning
that FROMVAL() and TOVAL() were not inverses as one would rationally
expect. There were various ways that one could deal with this bit of
dirty laundry; in particular I considered flipping the sign definition
of TZ/DTZ entry values. It seemed better in the end to keep the sign
the same (matching the IANA code's convention for timezone offset sign)
and negate the results at the call sites where needed. Since, in another
bit of weird design, all those call sites already had a multiplication
by MINS_PER_HOUR (the wrong spelling of "60", btw, but I digress) that
needed to be got rid of, this didn't result in touching more places
than I would have had to anyway.

Much worse for the present effort: I was reminded that ecpg contains
a *hard wired* list of known zone abbreviations and their GMT offsets;
a list that doesn't seem to have been updated since around 2003. I'm
not sure what a reasonable fix would be for that. ecpg can't assume it
has access to the timezone database, probably, so bringing it up to
speed with what I propose to do in the backend doesn't seem feasible.
For the moment I didn't change the data there, even though a lot of
the entries are obsolete.

The attached patch also fixes a possible free()-of-uninitialized-pointer
problem in PGTYPEStimestamp_defmt_scan().

regards, tom lane

Attachment Content-Type Size
rationalize-datetkn-1.patch text/x-diff 51.2 KB

From: Chris Bandy <bandy(dot)chris(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-09 14:49:05
Message-ID: CAMDg7Wwx-CjCj=XzkDhic+2O24-WNt2iv5fWdt=WusUfzHT+Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

typedef struct
> {
> ! char token[TOKMAXLEN + 1]; /* now always null-terminated */
> char type;
> ! int32 value;
> } datetkn;

Being entirely new to this code, "now" makes me think of the "current
timestamp". I think this word can be removed to reduce ambiguity.

+ /* use strncmp so that we match truncated tokens */
> result = strncmp(key, position->token, TOKMAXLEN);

In your proposal you wanted to remove "crufty code that deals with
non-null-terminated token strings". Is this some of that crufty code? Can
it be removed?

-- Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Bandy <bandy(dot)chris(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-09 15:08:37
Message-ID: 25952.1412867317@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chris Bandy <bandy(dot)chris(at)gmail(dot)com> writes:
> On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> + /* use strncmp so that we match truncated tokens */
>> result = strncmp(key, position->token, TOKMAXLEN);

> In your proposal you wanted to remove "crufty code that deals with
> non-null-terminated token strings". Is this some of that crufty code? Can
> it be removed?

Yeah, I had hoped to simplify these things to just strcmp, but on closer
inspection that didn't work, because some of the keywords in the table are
truncated at TOKMAXLEN (10 characters). If we just made these strcmp then
the code would stop recognizing e.g. "milliseconds".

I thought briefly about widening TOKMAXLEN so that there were no truncated
keywords in the table, but that seems risky from a backwards-compatibility
standpoint: as it stands, the code will accept "milliseconds",
"millisecond", or "millisecon", and there might possibly be applications
out there that depend on that. In any case I'd much rather keep the array
stride at 16 bytes for speed reasons; and who's to say we might not put in
some even-longer keywords in the future?

Another alternative we should maybe consider is leaving the definition
of the token field alone (ie, still not guaranteed null terminated)
which'd leave us with one free byte per datetkn entry. I can't think
of a likely reason to need another 1-byte field though, and the existing
definition is not without risk. That comment that was there about "don't
change this to strlcpy" was there because somebody broke it awhile back,
or at least submitted a patch that would've broken it if it'd been
accepted. People are too used to null-terminated strings in C; a field
definition that violates that norm is just trouble waiting to happen.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-14 22:26:24
Message-ID: 27680.1413325584@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I got interested in the problem discussed in
> http://www.postgresql.org/message-id/20714.1412456604@sss.pgh.pa.us
> to wit:
>> It's becoming clear to me that our existing design whereby zone
>> abbreviations represent fixed GMT offsets isn't really good enough.
>> I've been wondering whether we could change things so that, for instance,
>> "EDT" means "daylight time according to America/New_York" and the system
>> would consult the zic database to find out what the prevailing GMT offset
>> was in that zone on that date. This would be a lot more robust in the
>> face of the kind of foolishness we now see actually goes on.

Attached is an updated patch for this, incorporating my previous work on
changing the representation of datetkn. The code changes are complete
I believe, but I've not touched the user documentation yet, nor updated
the tznames data files except for changing MSK for testing purposes.

Some notes:

* There wasn't any reasonable way to return the required information about
a dynamic zone abbreviation from DecodeSpecial(). However, on looking
closely I found that of the existing callers of DecodeSpecial(), only two
actually relied on it to find both timezone abbreviations and regular
keywords. The other callers only wanted one case or the other. So it
seemed to me that it'd be best to split the abbreviation-lookup behavior
apart from keyword-lookup. DecodeSpecial() now does only the latter, and
there's a new function DecodeTimezoneAbbrev() to do the former. This
avoids touching any code that doesn't care about timezone abbreviations,
and should be a bit faster for those cases too (but about the same speed
otherwise).

* I found that there were pre-existing bugs in DetermineTimeZoneOffset()
for cases in which a zone changed offset but neither the preceding nor
following time segment was marked as DST time. This caused strange
behaviors for cases like Europe/Moscow's 2011 and 2014 time changes.
This is not terribly surprising because we never thought about zone
changes other than simple DST spring-forward/fall-back changes when
that code was written.

* I've not touched ecpg except for cosmetic changes to keep the struct
definitions in sync, and to fix the previously-mentioned bogus free()
attempt. I doubt that it would be worth teaching ecpg how to access the
zic timezone database --- the problem of configuring where to find those
files seems like more trouble than it's worth given the lack of
complaints. I'm not sure what we should do about the obsolete timezone
abbreviations in its table.

* timetz_zone() and DecodeTimeOnly() are not terribly consistent about
how they resolve timezones when not given a date. The former resolves
based on the value of time(NULL), which is a moving target even within
a transaction (which is why the function is marked volatile I guess).
The latter gets today's date as of start of transaction (so at least
it's stable) and then merges that m/d/y with the h/m/s from the time
value. That behavior does not seem terribly well thought out either:
on the day of, or day before or after, a DST change it's likely to
produce strange results, especially if the session timezone is different
from the zone specified in the input. I think we ought to consider
changing one or both of these, though it's not material for back-patching.

* In HEAD, we might want to extend the pg_timezone_abbrevs view to have
a column that shows the underlying zone for a dynamic abbreviation.
The attached patch just shows the abbreviation's behavior as of current
time (defined as now()), which is sort of good enough but certainly not
the whole truth.

Comments?

regards, tom lane

Attachment Content-Type Size
dynamic-timezone-abbrevs-wip-1.patch text/x-diff 128.7 KB

From: Michael Meskes <meskes(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-15 12:31:30
Message-ID: 543E6922.4000105@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.10.2014 00:26, Tom Lane wrote:
> * I've not touched ecpg except for cosmetic changes to keep the struct
> definitions in sync, and to fix the previously-mentioned bogus free()
> attempt. I doubt that it would be worth teaching ecpg how to access the
> zic timezone database --- the problem of configuring where to find those
> files seems like more trouble than it's worth given the lack of
> complaints. I'm not sure what we should do about the obsolete timezone
> abbreviations in its table.

Maybe we should just remove thme for the new release. Yes, that might
break some applications, but then the server doesn't know these either,
so the applications might break anyway.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Meskes <meskes(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-15 13:50:16
Message-ID: 2780.1413381016@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Meskes <meskes(at)postgresql(dot)org> writes:
> On 15.10.2014 00:26, Tom Lane wrote:
>> * I've not touched ecpg except for cosmetic changes to keep the struct
>> definitions in sync, and to fix the previously-mentioned bogus free()
>> attempt. I doubt that it would be worth teaching ecpg how to access the
>> zic timezone database --- the problem of configuring where to find those
>> files seems like more trouble than it's worth given the lack of
>> complaints. I'm not sure what we should do about the obsolete timezone
>> abbreviations in its table.

> Maybe we should just remove thme for the new release. Yes, that might
> break some applications, but then the server doesn't know these either,
> so the applications might break anyway.

The same thought had occurred to me. Probably the main use of the
datetime parsing code in ecpg is for interpreting outputs from the
server, and (at least by default) the server doesn't use timezone
abbreviations when printing timestamps. So maybe that's largely
dead code anyhow. I would not propose back-patching such a change,
but we could try it in 9.5 and see if anyone complains.

A less drastic remedy would be to remove just those abbreviations
whose meaning has actually changed over time. Eventually that
might be all of them ... but in the meantime, we could at least
argue that we weren't breaking any case that worked well before.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-15 19:41:18
Message-ID: 18061.1413402078@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

... and here is a draft patch for the timezone abbreviation data files.

I changed all the abbreviations for which the parent zone had used more
than one GMT offset since 1970. That seemed like a good cutoff to avoid
wasting cycles on ancient history, especially since the IANA people
themselves don't make any large promises about the quality of their data
before 1970.

Although zones in the Russian Federation are the majority of zones that
had abbreviation changes, a quick look at this patch shows that they're
hardly the only ones. We've been sticking our heads in the sand about
this problem for quite a while :-(

regards, tom lane

Attachment Content-Type Size
dynamic-abbrevs-data-changes-1.patch text/x-diff 47.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-15 21:43:36
Message-ID: 21973.1413409416@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A bit of documentation-hacking later, here is the complete proposed patch
including docs updates.

As a reminder, I'm proposing to apply this to the back branches as well.
There's some further cleanup that should only happen in HEAD, but I think
we are going to be hearing about it if released branches fail to cope
nicely with the upcoming Russian zone changes.

regards, tom lane

Attachment Content-Type Size
dynamic-zone-abbrevs-full-1.patch text/x-diff 193.9 KB

From: Michael Meskes <meskes(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-22 08:00:23
Message-ID: 20141022080023.GA8495@feivel.credativ.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote:
> The same thought had occurred to me. Probably the main use of the
> datetime parsing code in ecpg is for interpreting outputs from the
> server, and (at least by default) the server doesn't use timezone
> abbreviations when printing timestamps. So maybe that's largely
> dead code anyhow. I would not propose back-patching such a change,
> but we could try it in 9.5 and see if anyone complains.

Agreed on all accounts.

> A less drastic remedy would be to remove just those abbreviations
> whose meaning has actually changed over time. Eventually that
> might be all of them ... but in the meantime, we could at least
> argue that we weren't breaking any case that worked well before.

This is what your patch did, right?

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Meskes <meskes(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-10-22 15:32:41
Message-ID: 11467.1413991961@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Meskes <meskes(at)postgresql(dot)org> writes:
> On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote:
>> The same thought had occurred to me. Probably the main use of the
>> datetime parsing code in ecpg is for interpreting outputs from the
>> server, and (at least by default) the server doesn't use timezone
>> abbreviations when printing timestamps. So maybe that's largely
>> dead code anyhow. I would not propose back-patching such a change,
>> but we could try it in 9.5 and see if anyone complains.

> Agreed on all accounts.

>> A less drastic remedy would be to remove just those abbreviations
>> whose meaning has actually changed over time. Eventually that
>> might be all of them ... but in the meantime, we could at least
>> argue that we weren't breaking any case that worked well before.

> This is what your patch did, right?

No, I did not touch ecpg's set of tokens at all, just changed the
representation of datetktbl to match the new backend coding.
I figured we could discuss behavioral changes separately.

I don't have a strong opinion about which of the above things to do ...
what's your preference?

regards, tom lane


From: Michael Meskes <meskes(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Date: 2014-11-04 16:29:59
Message-ID: 20141104162959.GA6430@feivel.credativ.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 22, 2014 at 11:32:41AM -0400, Tom Lane wrote:
> I don't have a strong opinion about which of the above things to do ...
> what's your preference?

I think it's better for the future if me make a clean cut. Yes, the option
keeps compatability a little bit higher, but that doesn't matter that much as
the codes won't be used by the server anyway. Besides, they may eventually
change and then we have to make sure to remember ecpg's copy again.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL