Quick Links

Re: storing TZ along timestamps

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	storing TZ along timestamps
Date:	2011-05-27 20:43:28
Message-ID:	1306528155-sup-8852@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

One of our customers is interested in being able to store original
timezone along with a certain timestamp.

It is currently possible to store a TZ in a separate column, but this is
a bit wasteful and not very convenient anyway.

There are all sorts of UI issues that need to be resolved in order for
this to be a complete feature proposal, but the first thing that we
discussed was what is the storage going to look like. Of course, one
thing we don't want is to store the complete TZ name as text.

So the first thing is cataloguing timezone names, and assigning an ID to
each (maybe an OID). If we do that, then we can store the OID of the
timezone name along the int64/float8 of the actual timestamp value.

Right now we rely on the tzdata files on disk for things like
pg_timezone_names and other accesses of TZ data; so the files are the
authoritative source of TZ info. So we need to ensure that whenever the
files are updated, the catalogs are updated as well.
I think we could make this work if we "refreshed" the catalog from the
files on SIGHUP if the directory changes (say, a new timezone is
created). Note that I am currently proposing to store only the zone
names in the catalog, not the full TZ data.

Are there objections to the general idea? If not, I'll flesh a more
complete proposal.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 21:09:02
Message-ID:	46BA08FD-6938-440B-9925-EBA049FEAD5B@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 27, 2011, at 1:43 PM, Alvaro Herrera wrote:

> Right now we rely on the tzdata files on disk for things like
> pg_timezone_names and other accesses of TZ data; so the files are the
> authoritative source of TZ info. So we need to ensure that whenever the
> files are updated, the catalogs are updated as well.
> I think we could make this work if we "refreshed" the catalog from the
> files on SIGHUP if the directory changes (say, a new timezone is
> created). Note that I am currently proposing to store only the zone
> names in the catalog, not the full TZ data.
>
> Are there objections to the general idea? If not, I'll flesh a more
> complete proposal.

I like it, but what do you do when a TZ has been renamed or has ceased to exist. Or, worse, existed last week, so last week's dates might still use it, but next week's must not?

Best,

David

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 21:10:42
Message-ID:	3541.1306530642@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Right now we rely on the tzdata files on disk for things like
> pg_timezone_names and other accesses of TZ data; so the files are the
> authoritative source of TZ info. So we need to ensure that whenever the
> files are updated, the catalogs are updated as well.
> I think we could make this work if we "refreshed" the catalog from the
> files on SIGHUP if the directory changes (say, a new timezone is
> created).

(1) SIGHUP processing normally occurs outside any transaction.

(2) The only obvious way to ensure the "refresh" is done once, and not
once per backend, is to have the postmaster do it ... which is a
nonstarter for many reasons.

I'd suggest instead considering something like the pg_collations
approach: load up the catalog once at initdb. If the user really needs
to add to the set of accessible TZ names later, give him a tool to do
that. But it's 100% not worth either the implementation pain or the
cycles to try to auto-update the catalog, especially not as often as
once per SIGHUP.

BTW, what will you do about pg_upgrade? Ensuring the OID mapping
doesn't change seems like loads of fun.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	"Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 21:32:37
Message-ID:	4DDFD225020000250003DE1C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> wrote:

> I like it, but what do you do when a TZ has been renamed or has
> ceased to exist. Or, worse, existed last week, so last week's
> dates might still use it, but next week's must not?

I think the key thing is that the timestamp portion of it would be
identical to our current TIMESTAMP WITH TIME ZONE -- always store it
in the value UTC zone. That way comparisons and math between
timestamps could remain sane. The stored time zone portion would be
what it would be the display format, if usable. In an extreme
situation like you describe above, I guess you could fall back on
what we do now for display of a timestamptz value.

Personally, I think it would be good to move a bit closer to the
standard by including a time zone in a TIMESTAMP WITH TIME ZONE
value. The biggest problem I can see is how to try to do this in a
standard conforming fashion without breaking existing code. It
would seem more than a little odd to support the standard semantics
with nonstandard syntax and vice versa.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 21:35:41
Message-ID:	5207.1306532141@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> I like it, but what do you do when a TZ has been renamed or has ceased
> to exist.

As far as that goes, I think "nothing" is a sufficient answer. There's
no requirement that an OID in the mapping table correspond to a live TZ.
It's just a more compact way of storing a string name.

regards, tom lane

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 21:39:53
Message-ID:	E769A699-25EB-4822-9127-634E791B0B6E@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 27, 2011, at 2:35 PM, Tom Lane wrote:

> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
>> I like it, but what do you do when a TZ has been renamed or has ceased
>> to exist.
>
> As far as that goes, I think "nothing" is a sufficient answer. There's
> no requirement that an OID in the mapping table correspond to a live TZ.
> It's just a more compact way of storing a string name.

Well then you'd just want to be sure to never delete TZs.

I think the issue of trying this week to use a TZ that was removed last week might be more problematic. I mean, we could just let the user use it, but that hardly seems wise…

Best,

David

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 22:54:05
Message-ID:	BANLkTik3Sugo5xwsL--XG7R1hM=+jq1g5A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 27, 2011 at 2:32 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> I think the key thing is that the timestamp portion of it would be
> identical to our current TIMESTAMP WITH TIME ZONE -- always store it
> in the value UTC zone.

Fwiw our timestamp with time zone stores seconds since the epoch. This
is a quantity which is independent of timezones entirely. Excluding
relativistic effects there have been the same number of time zones
since that point in time regardless of where you stand relative to the
sun.

My question for Alvarro is whether he really wants the text label for
the time zone at all, or just the offset which was used to enter it.
That is, if I enter "12:00pm" with my current time zone set to GMT and
later update the tzdata on the machine to start summer time on a
earlier date should the data type now show "1:00pm BST" or should it
still display "12:00pm +000" and leave it up to the reader to decide
whether why I entered it in a weird time zone for that time of year?

--
greg

From:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 23:13:12
Message-ID:	4DE03008.5050900@pinpointresearch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/27/2011 01:43 PM, Alvaro Herrera wrote:
> Hi,
>
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.
I am very interested in the use-case for this (in part as I'm working on
a PG related time talk). My experience thus far is that people who want
this do not fully understand the nature of date-time calculations and
variables in PG.
> It is currently possible to store a TZ in a separate column, but this is
> a bit wasteful and not very convenient anyway.

>
> Are there objections to the general idea? If not, I'll flesh a more
> complete proposal.
I'm not crazy about it.

Although time-stamp-with-time-zone is, perhaps, a bad name for what is
actually a "point in time", a point-in-time is what timestamptz
represents. I can enter it and allow my defaults to take over, specify
abbreviations, explicit offsets or long names none of which change the
actual point in time. Likewise, I can display said point-in-time in any
of dozens of ways according to my needs.

steve=# select '2011-05-27 12:34'::timestamptz;
timestamptz
------------------------
2011-05-27 12:34:00-07

steve=# select '2011-05-27 12:34-07'::timestamptz;
timestamptz
------------------------
2011-05-27 12:34:00-07

steve=# select '2011-05-27 12:34 PDT'::timestamptz;
timestamptz
------------------------
2011-05-27 12:34:00-07

steve=# select '2011-05-27 11:34 PST'::timestamptz;
timestamptz
------------------------
2011-05-27 12:34:00-07

steve=# select '2011-05-27 15:34 US/Eastern'::timestamptz;
timestamptz
------------------------
2011-05-27 12:34:00-07

select now() - '02:58:54.605041'::interval;
?column?
-------------------------------
2011-05-27 12:34:00.394959-07

Granted, I'm a random sample of 1, but I've never found anyone with a
real need for this feature - especially since the capability already
exists to achieve the requested result, and much more flexibly, by
either a separate column or a user-defined type.

Questions:

What would be the storage impact (tables, indexes and backups) for those
of use with tens-of-millions of pieces of timestamp data?

What type of timestamp would be stored? Abbreviated/offset (PST, -07),
full (US/Eastern) or a mix? Is there an expectation that the stored time
zone information would be used for any calculation purposes? If so, how
would rules be applied? Would there be any form of error-checking?
Currently PG accepts non-existent time zones but maps them to UTC:

steve=# select '2011-05-27 15:34'::timestamptz at time zone 'US/f00';
timezone
---------------------
2011-05-27 15:34:00

Would there be any impact to existing queries?

How would dump/restore issues be handled - especially if the time-zone
info changes in between?

More as I think of them.

Cheers,
Steve

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 23:29:55
Message-ID:	BANLkTinL7CDj+GShTY0SXWHnWNyGjQ8WNA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 27, 2011 at 4:13 PM, Steve Crawford
<scrawford(at)pinpointresearch(dot)com> wrote:
> I am very interested in the use-case for this (in part as I'm working on a
> PG related time talk). My experience thus far is that people who want this
> do not fully understand the nature of date-time calculations and variables
> in PG.

The use cases I recall having been mentioned in the past were accurate
data retention and calendaring applications.

Accurate data retention for things like drug trials need to guarantee
they retain precisely what the user entered, not an equivalent value.
If you run a report on a drug trial you need to see that the event was
recorded as occuring at 1:00pm EST not 6:00pm GMT even if you happen
to run the report in London.

And calendaring apps want to know what timezone is attached to an
event, not only the point in time at which it occurs. If your plane
flight departs at 12:00pm GMT and lands at 2:00pm EST you need to know
that to book your taxi at 2:30pm EST -- not 7:30pm GMT.

Both of these two cases can be handled differently. The former by
storing the raw text inputs and then storing the interpreted value as
a derived column separetly, and the latter by storing the local time
zone to use for display as an additional attribute along with the
local address and other attributes of the calendar event.

--
greg

From:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-27 23:57:56
Message-ID:	4DE03A84.6050304@pinpointresearch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/27/2011 04:29 PM, Greg Stark wrote:
> On Fri, May 27, 2011 at 4:13 PM, Steve Crawford
> <scrawford(at)pinpointresearch(dot)com> wrote:
>> I am very interested in the use-case for this (in part as I'm working on a
>> PG related time talk). My experience thus far is that people who want this
>> do not fully understand the nature of date-time calculations and variables
>> in PG.
> The use cases I recall having been mentioned in the past were accurate
> data retention and calendaring applications.
>
> Accurate data retention for things like drug trials need to guarantee
> they retain precisely what the user entered, not an equivalent value.
> If you run a report on a drug trial you need to see that the event was
> recorded as occuring at 1:00pm EST not 6:00pm GMT even if you happen
> to run the report in London.
>
> And calendaring apps want to know what timezone is attached to an
> event, not only the point in time at which it occurs. If your plane
> flight departs at 12:00pm GMT and lands at 2:00pm EST you need to know
> that to book your taxi at 2:30pm EST -- not 7:30pm GMT.
>
> Both of these two cases can be handled differently. The former by
> storing the raw text inputs and then storing the interpreted value as
> a derived column separetly, and the latter by storing the local time
> zone to use for display as an additional attribute along with the
> local address and other attributes of the calendar event.
>
So the proposed change does not handle the first case as you need to
capture the raw input.

And the second case is already well handled. In fact calendaring is a
great example. I enter the time for the teleconference and PG nicely
uses my default timezone to store the point-in-time. When you retrieve
it, it is shown in your timezone and we both pick up the phone at the
correct time. And if I know I'll be somewhere else at that time, I just
ask for the data in that zone. Altering the data type gains nothing.

Cheers,
Steve

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-28 00:10:58
Message-ID:	3897E3E8-30BD-4639-8A4A-C9061D4E432E@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 27, 2011, at 6:29 PM, Greg Stark wrote:
> Both of these two cases can be handled differently. The former by
> storing the raw text inputs and then storing the interpreted value as
> a derived column separetly, and the latter by storing the local time
> zone to use for display as an additional attribute along with the
> local address and other attributes of the calendar event.

Which means you're back to a very cumbersome method that involves another field. That's a tremendous amount of extra code.

We run multiple businesses around the globe. Each business operates in it's own timezone, and 90% of the time we want things handled in that timezone. The wheels fall off the wagon if we try and combine data from multiple locations into a single database; there's no reasonable way to say: give me the data in this field *at the timezone that was originally entered*, except for not storing timezone data at all. If we don't store timezone data at all, then it's impossible to determine an actual point in time that something happened at.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	tomas(at)tuxteam(dot)de
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-28 06:26:53
Message-ID:	20110528062653.GA28589@tomas
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, May 27, 2011 at 04:43:28PM -0400, Alvaro Herrera wrote:
> Hi,
>
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.

I've felt that pain here and there too...

> So the first thing is cataloguing timezone names, and assigning an ID to
> each (maybe an OID). If we do that, then we can store the OID of the
> timezone name along the int64/float8 of the actual timestamp value.
>
> Right now we rely on the tzdata files on disk for things like
> pg_timezone_names and other accesses of TZ data; so the files are the
> authoritative source of TZ info. So we need to ensure that whenever the
> files are updated, the catalogs are updated as well.

Problem with this approach (mapping external time zone names to OIDs)
is: dump/restore would only be meaningful if you "carry over" the time
zone data, right?

That is: two independent systems are likely to have different mappings
(even if at some point they have the "same" TZ data?)

What would be a solution to that?

(a) A central, "official" catalog, with only additions, never deletions
(perhaps with some space carved out for "local" additions, to minimize
conflicts)?
(b) A hash of the time zone name?

Both not very good ideas, I know. Although (a) might be less bad than it
seems. Most Unixoids (including OSX) seem to have basically Olson's.
Don't know about Windows, but it might seem feasible to make some
mapping (or union). Only important rule: no backtrack :-)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN4JWtBcgs9XrR2kYRAt+mAJ0atx3u6pll50+s4vVwCKZUjqmnSQCffWNe
gzSFgRCFUvsd8pbH1Qm/ho4=
=FVhO
-----END PGP SIGNATURE-----

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-05-28 21:58:48
Message-ID:	1306619928.342.5.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On fre, 2011-05-27 at 16:57 -0700, Steve Crawford wrote:
> And the second case is already well handled. In fact calendaring is a
> great example. I enter the time for the teleconference and PG nicely
> uses my default timezone to store the point-in-time. When you
> retrieve
> it, it is shown in your timezone and we both pick up the phone at the
> correct time. And if I know I'll be somewhere else at that time, I
> just
> ask for the data in that zone. Altering the data type gains nothing.

How about a recurring appointment that happens every Tuesday whenever it
is 9:00am in California, independent of DST (in California or where ever
the participant actually is). I'm not sure how to solve that within the
SQL framework. You might need to use time with time zone with a
placeholder timezone, and then a rule that date + time with time zone
creates a timestamp with time zone that resolves the time zone for that
particular day.

From:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-01 18:13:34
Message-ID:	4DE6814E.4040802@pinpointresearch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/28/2011 02:58 PM, Peter Eisentraut wrote:
> On fre, 2011-05-27 at 16:57 -0700, Steve Crawford wrote:
>> And the second case is already well handled. In fact calendaring is a
>> great example. I enter the time for the teleconference and PG nicely
>> uses my default timezone to store the point-in-time. When you
>> retrieve
>> it, it is shown in your timezone and we both pick up the phone at the
>> correct time. And if I know I'll be somewhere else at that time, I
>> just
>> ask for the data in that zone. Altering the data type gains nothing.
> How about a recurring appointment that happens every Tuesday whenever it
> is 9:00am in California, independent of DST (in California or where ever
> the participant actually is). I'm not sure how to solve that within the
> SQL framework. You might need to use time with time zone with a
> placeholder timezone, and then a rule that date + time with time zone
> creates a timestamp with time zone that resolves the time zone for that
> particular day.
>
>
Interval math is pretty smart about that:

select '2011-05-31 09:00'::timestamp at time zone 'PST8PDT' + ('7
days'::interval * generate_series(1,60));
------------------------
2011-06-07 09:00:00-07
2011-06-14 09:00:00-07
2011-06-21 09:00:00-07
2011-06-28 09:00:00-07
2011-07-05 09:00:00-07
2011-07-12 09:00:00-07
2011-07-19 09:00:00-07
2011-07-26 09:00:00-07
2011-08-02 09:00:00-07
2011-08-09 09:00:00-07
2011-08-16 09:00:00-07
2011-08-23 09:00:00-07
2011-08-30 09:00:00-07
2011-09-06 09:00:00-07
2011-09-13 09:00:00-07
2011-09-20 09:00:00-07
2011-09-27 09:00:00-07
2011-10-04 09:00:00-07
2011-10-11 09:00:00-07
2011-10-18 09:00:00-07
2011-10-25 09:00:00-07
2011-11-01 09:00:00-07
2011-11-08 09:00:00-08
2011-11-15 09:00:00-08
2011-11-22 09:00:00-08
2011-11-29 09:00:00-08
2011-12-06 09:00:00-08
2011-12-13 09:00:00-08
2011-12-20 09:00:00-08
2011-12-27 09:00:00-08
2012-01-03 09:00:00-08
2012-01-10 09:00:00-08
2012-01-17 09:00:00-08
2012-01-24 09:00:00-08
2012-01-31 09:00:00-08
2012-02-07 09:00:00-08
2012-02-14 09:00:00-08
2012-02-21 09:00:00-08
2012-02-28 09:00:00-08
2012-03-06 09:00:00-08
2012-03-13 09:00:00-07
2012-03-20 09:00:00-07
2012-03-27 09:00:00-07
2012-04-03 09:00:00-07
2012-04-10 09:00:00-07
2012-04-17 09:00:00-07
2012-04-24 09:00:00-07
2012-05-01 09:00:00-07
2012-05-08 09:00:00-07
2012-05-15 09:00:00-07
2012-05-22 09:00:00-07
2012-05-29 09:00:00-07
...

Or if you have to call in from London (notice the blips between 4pm and
5pm due to London and California switching to/from DST on different dates):

select ('2011-05-31 09:00'::timestamp at time zone 'PST8PDT' + ('7
days'::interval * generate_series(1,60))) at time zone 'Europe/London';
---------------------
2011-06-07 17:00:00
2011-06-14 17:00:00
2011-06-21 17:00:00
2011-06-28 17:00:00
2011-07-05 17:00:00
2011-07-12 17:00:00
2011-07-19 17:00:00
2011-07-26 17:00:00
2011-08-02 17:00:00
2011-08-09 17:00:00
2011-08-16 17:00:00
2011-08-23 17:00:00
2011-08-30 17:00:00
2011-09-06 17:00:00
2011-09-13 17:00:00
2011-09-20 17:00:00
2011-09-27 17:00:00
2011-10-04 17:00:00
2011-10-11 17:00:00
2011-10-18 17:00:00
2011-10-25 17:00:00
2011-11-01 16:00:00
2011-11-08 17:00:00
2011-11-15 17:00:00
2011-11-22 17:00:00
2011-11-29 17:00:00
2011-12-06 17:00:00
2011-12-13 17:00:00
2011-12-20 17:00:00
2011-12-27 17:00:00
2012-01-03 17:00:00
2012-01-10 17:00:00
2012-01-17 17:00:00
2012-01-24 17:00:00
2012-01-31 17:00:00
2012-02-07 17:00:00
2012-02-14 17:00:00
2012-02-21 17:00:00
2012-02-28 17:00:00
2012-03-06 17:00:00
2012-03-13 16:00:00
2012-03-20 16:00:00
2012-03-27 17:00:00
2012-04-03 17:00:00
2012-04-10 17:00:00
2012-04-17 17:00:00
2012-04-24 17:00:00
2012-05-01 17:00:00
2012-05-08 17:00:00
...

Cheers,
Steve

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-01 23:57:40
Message-ID:	1306972660.7605.0.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> Hi,
>
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.

I assume that you're talking about a new data type, not augmenting the
current types, correct?

Regards,
Jeff Davis

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 00:18:44
Message-ID:	1306973919-sup-8205@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011:
> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> > Hi,
> >
> > One of our customers is interested in being able to store original
> > timezone along with a certain timestamp.
>
> I assume that you're talking about a new data type, not augmenting the
> current types, correct?

Yes

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 01:36:32
Message-ID:	BANLkTikFbURqtP7-V4NJK7Rseq6kYL3=cQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 1, 2011 at 8:18 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011:
>> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
>> > Hi,
>> >
>> > One of our customers is interested in being able to store original
>> > timezone along with a certain timestamp.
>>
>> I assume that you're talking about a new data type, not augmenting the
>> current types, correct?
>
> Yes

why not use a composite type for that? performance maybe?

merlin

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 17:55:07
Message-ID:	1307037210-sup-221@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Merlin Moncure's message of mié jun 01 21:36:32 -0400 2011:
> On Wed, Jun 1, 2011 at 8:18 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011:
> >> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> >> > Hi,
> >> >
> >> > One of our customers is interested in being able to store original
> >> > timezone along with a certain timestamp.
> >>
> >> I assume that you're talking about a new data type, not augmenting the
> >> current types, correct?
> >
> > Yes
>
> why not use a composite type for that? performance maybe?

To avoid having to implement all the operators and lookup tables (of
timezones) in userland, mainly. Probably performance would be affected
too, not sure, but that's not the main point.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 18:06:59
Message-ID:	4DE7D143.7020508@pinpointresearch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/01/2011 05:18 PM, Alvaro Herrera wrote:
> Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011:
>> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
>>> Hi,
>>>
>>> One of our customers is interested in being able to store original
>>> timezone along with a certain timestamp.
>> I assume that you're talking about a new data type, not augmenting the
>> current types, correct?
> Yes
>
That eliminates many of my issues - I just didn't want the type changed
underneath me. But some considerations remain - including some new that
have crossed my mind:

1. How would the time-zone be defined in this composite? Offset from
GMT? Timezone (well, link thereto) with all DST rules intact? Would
"extract" need to be modified to include the ability to grab the timezone?

2. What would be the precedence for defining originating timezone?
Default? Set timezone to? ...at time zone...? Based on the timestamp
(2011-06-02 12:34:56-07)?

3. Would indexing/sorting include the originating zone? If so, how would
time zones collate (base offset, actual offset based on the timestamp,
name)?

4. What would be the corresponding type when used with
Perl/PHP/Python/... applications - would they require special
non-standard handling?

Since this isn't going to alter my current beloved timestamptz and I
don't have a use-case I leave the decisions on the above to others. But
in my imagined use-cases I still see the originating zone as a separate
piece of information better handled as a different column - for example
sorting by timestamp plus priority or selecting everything for a
specific time zone.

Cheers,
Steve

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 18:36:22
Message-ID:	BANLkTikA4aZQaKn4V1RGQdAG2g3ddL5Vrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 2, 2011 at 12:55 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>> >> > One of our customers is interested in being able to store original
>> >> > timezone along with a certain timestamp.
>> >>
>> >> I assume that you're talking about a new data type, not augmenting the
>> >> current types, correct?
>> >
>> > Yes
>>
>> why not use a composite type for that? performance maybe?
>
> To avoid having to implement all the operators and lookup tables (of
> timezones) in userland, mainly. Probably performance would be affected
> too, not sure, but that's not the main point.

right -- I see where you are going with this. ok, some random questions:
*) what about making a 'timezone' type in addition to (or even instead
of) the timezonetz_inputtz? Then you could in theory treat the your
proposed type as a composite of timezonetz and timezone, just as
timestamptz is a 'composite' of date and timetz. (note I'm not
necessarily arguing against the creation of a specific unified type --
performance is important for time types).

*) in/out formats...what would be the wire formats of your type -- in
particular, the binary format?

*) do you see your type interacting with various datetime function
(like extract) or will a cast be required? Interval math?

*) how does ordering and uniqueness apply to same timestamps with
unique input time zones?

merlin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 18:37:45
Message-ID:	1307039865.11647.12.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.

Another thing to consider is that this will eliminate any useful total
order.

You could define an arbitrary total order, of course, just to allow
BTrees for equality searches. However, I don't think you should define
">" (and other non-equality comparator operators) according to that
total order -- they should be more hidden like "~>~". ">" should not
exist as an operator over this type at all.

I also do not like the idea of having "=" mean "equivalent after
timezone adjustment". If we're making a distinction between "2000-01-01
10:00:00 +03" and "2000-01-01 9:00:00 +02", then "=" should not obscure
that distinction.

Regards,
Jeff Davis

From:	Christopher Browne <cbbrowne(at)gmail(dot)com>
To:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 18:46:18
Message-ID:	BANLkTikx9OyLO4q390y-ungVuse0tuSytA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 2, 2011 at 6:06 PM, Steve Crawford
<scrawford(at)pinpointresearch(dot)com> wrote:
> On 06/01/2011 05:18 PM, Alvaro Herrera wrote:
>>
>> Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011:
>>>
>>> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
>>>>
>>>> Hi,
>>>>
>>>> One of our customers is interested in being able to store original
>>>> timezone along with a certain timestamp.
>>>
>>> I assume that you're talking about a new data type, not augmenting the
>>> current types, correct?
>>
>> Yes
>>
> That eliminates many of my issues - I just didn't want the type changed
> underneath me. But some considerations remain - including some new that have
> crossed my mind:
>
> 1. How would the time-zone be defined in this composite? Offset from GMT?
> Timezone (well, link thereto) with all DST rules intact? Would "extract"
> need to be modified to include the ability to grab the timezone?

That doesn't seem appropriate, because timezones are not always
represented by strict offsets from GMT. Some frequently-used
timezones represent variable offsets. ("EDT/EST", I'm looking at
you!)

> 2. What would be the precedence for defining originating timezone? Default?
> Set timezone to? ...at time zone...? Based on the timestamp (2011-06-02
> 12:34:56-07)?
>
> 3. Would indexing/sorting include the originating zone? If so, how would
> time zones collate (base offset, actual offset based on the timestamp,
> name)?

Some timezones contain discontinuities, so that the notion of sorting
them seems implausible, as there isn't properly an "ordering."

> 4. What would be the corresponding type when used with Perl/PHP/Python/...
> applications - would they require special non-standard handling?
>
> Since this isn't going to alter my current beloved timestamptz and I don't
> have a use-case I leave the decisions on the above to others. But in my
> imagined use-cases I still see the originating zone as a separate piece of
> information better handled as a different column - for example sorting by
> timestamp plus priority or selecting everything for a specific time zone.

I'd tend to think that this is best captured by having two pieces of
information:
a) The timestamp in UTC terms, so that it's a totally stable value,
which is amenable to comparison against other timestamps (irrespective
of timezone)
b) A symbolic representation of the timezone, perhaps its name.

It's not at all obvious that these ought to be treated as a singular data type.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc:	Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-02 19:02:01
Message-ID:	1307041321.11647.34.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2011-06-02 at 18:46 +0000, Christopher Browne wrote:
> > 1. How would the time-zone be defined in this composite? Offset from GMT?
> > Timezone (well, link thereto) with all DST rules intact? Would "extract"
> > need to be modified to include the ability to grab the timezone?
>
> That doesn't seem appropriate, because timezones are not always
> represented by strict offsets from GMT. Some frequently-used
> timezones represent variable offsets. ("EDT/EST", I'm looking at
> you!)

In conjunction with a specific timestamp, a timezone does strictly map
to a single offset.

That is, unless it's a timestamp in the future, and someone decides to
adjust a timezone before the timestamp actually occurs. But that's a
problem with the current timestamptz implementation anyway...

> > Since this isn't going to alter my current beloved timestamptz and I don't
> > have a use-case I leave the decisions on the above to others. But in my
> > imagined use-cases I still see the originating zone as a separate piece of
> > information better handled as a different column - for example sorting by
> > timestamp plus priority or selecting everything for a specific time zone.

I have a similar inclination. ">" seems like the fundamental operation
you'd want to perform on any timestamp (perhaps more so than equality),
and that's not well-defined if there is no total order (but several
meaningful partial orders).

However, I do see some nice benefits, too. The main one is that you can
easily get either local time or GMT out of it. So you can answer queries
such as "which of these activities occurred outside of normal business
hours" as well as "which of these events happened first". It would take
a little care to use properly, however.

Regards,
Jeff Davis

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-03 00:28:57
Message-ID:	BANLkTi=wsL84jF6MeaBctEP5397cNjOGGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 2, 2011 at 3:02 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2011-06-02 at 18:46 +0000, Christopher Browne wrote:
>> > 1. How would the time-zone be defined in this composite? Offset from GMT?
>> > Timezone (well, link thereto) with all DST rules intact? Would "extract"
>> > need to be modified to include the ability to grab the timezone?
>>
>> That doesn't seem appropriate, because timezones are not always
>> represented by strict offsets from GMT. Some frequently-used
>> timezones represent variable offsets. ("EDT/EST", I'm looking at
>> you!)
>
> In conjunction with a specific timestamp, a timezone does strictly map
> to a single offset.

But that doesn't seem like enough, because if someone adds '1 day',
knowing the offset isn't sufficient to figure out the answer. You
have to know where the DST boundary is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-03 00:48:53
Message-ID:	1307062133.15672.10.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2011-06-02 at 20:28 -0400, Robert Haas wrote:
> But that doesn't seem like enough, because if someone adds '1 day',
> knowing the offset isn't sufficient to figure out the answer. You
> have to know where the DST boundary is.

Good point, I guess the timezone itself needs to be stored. That's a
little unfortunate, because timezones are somewhat of a moving target
(which I think was Tom's point).

That means that we'd need an entire history (and future?) of timezone
definitions, and apply the timezone definition as of the associated
timestamp to get the offset. Or, should we apply the timezone definition
as of the "real" time the value was entered?

Regards,
Jeff Davis

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-03 03:58:33
Message-ID:	0B9D4F4F-CDCD-49C2-A926-CF37756AB7C5@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 2, 2011, at 7:48 PM, Jeff Davis wrote:
> On Thu, 2011-06-02 at 20:28 -0400, Robert Haas wrote:
>> But that doesn't seem like enough, because if someone adds '1 day',
>> knowing the offset isn't sufficient to figure out the answer. You
>> have to know where the DST boundary is.
>
> Good point, I guess the timezone itself needs to be stored. That's a
> little unfortunate, because timezones are somewhat of a moving target
> (which I think was Tom's point).
>
> That means that we'd need an entire history (and future?) of timezone
> definitions, and apply the timezone definition as of the associated
> timestamp to get the offset. Or, should we apply the timezone definition
> as of the "real" time the value was entered?

As someone else mentioned, timestamptz suffers the exact same problems.

I'm torn between whether the type should store the original time or the original time converted to GMT. I believe you would have the most accuracy if you stored the original time... but then indexing becomes problematic. I don't know if this data quality issue can be solved by anything short of somehow storing the actual timezone rule that was in place when the data was set.

Speaking of input; someone asked what timezone should be used as the "original" timezone. We should use whatever timezone was passed in with the value, and if one wasn't passed in we should use whatever the timezone GUC is set to (I'm assuming that's what timestamptz does).
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-03 09:56:22
Message-ID:	1307094982.32120.2.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On tor, 2011-06-02 at 22:58 -0500, Jim Nasby wrote:
> I'm torn between whether the type should store the original time or
> the original time converted to GMT. I believe you would have the most
> accuracy if you stored the original time... but then indexing becomes
> problematic. I don't know if this data quality issue can be solved by
> anything short of somehow storing the actual timezone rule that was in
> place when the data was set.
>
> Speaking of input; someone asked what timezone should be used as the
> "original" timezone. We should use whatever timezone was passed in
> with the value, and if one wasn't passed in we should use whatever the
> timezone GUC is set to (I'm assuming that's what timestamptz does).

I think all of that comes down to business rules. Train and airline
companies etc. have probably figured this out for themselves, not
necessarily consistent with each other. So it's doubtful whether a
single solution that we can hash out here is going to work well in
practice. I think making it easier to implement particular business
rules in this area in userspace (composite types, etc.) might go a
longer way.

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-04 08:56:43
Message-ID:	BANLkTimRGA59Bwsc4C4tO8RKUODsyMyitw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>
> I'm torn between whether the type should store the original time or the original time converted to GMT.

This is the wrong way to think about it. We *never* store time
"converted to GMT". When we want to represent a point in time we
represent it as seconds since the epoch. This is a quantity that is
the same everywhere in the world (modulo relativistic effects...) ie.
it's a concept which is completely independent of timezones.

The question here is how to represent more complex concepts than
simply points in time. I think the two concepts under discussion are
a) a composite type representing a point in time and a timezone it
should be interpreted in for operations and display and b) the
original input provided which is a text string with the constraint
that it's a valid input which can be interpreted as a point in time.

--
greg

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-06-06 06:50:24
Message-ID:	ECA252D8-C4E6-4CBF-8C47-DD958BD1664F@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>
>> I'm torn between whether the type should store the original time or the original time converted to GMT.
>
> This is the wrong way to think about it. We *never* store time
> "converted to GMT". When we want to represent a point in time we
> represent it as seconds since the epoch.
Right. Sorry, my bad.

> The question here is how to represent more complex concepts than
> simply points in time. I think the two concepts under discussion are
> a) a composite type representing a point in time and a timezone it
> should be interpreted in for operations and display and b) the
> original input provided which is a text string with the constraint
> that it's a valid input which can be interpreted as a point in time.

My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-06 14:24:52
Message-ID:	6C1CEB90-DF33-474B-A579-6E981A556C7C@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On May 27, 2011, at 11:43 PM, Alvaro Herrera wrote:
>
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.
>
> It is currently possible to store a TZ in a separate column, but this is
> a bit wasteful and not very convenient anyway.
>
> There are all sorts of UI issues that need to be resolved in order for
> this to be a complete feature proposal, but the first thing that we
> discussed was what is the storage going to look like. Of course, one
> thing we don't want is to store the complete TZ name as text.
>
> So the first thing is cataloguing timezone names, and assigning an ID to
> each (maybe an OID). If we do that, then we can store the OID of the
> timezone name along the int64/float8 of the actual timestamp value.

So, I'd think there are 2 reasonable approaches to storing the
timezone part:

1. Store the timezone abbreviation (i.e. 'EST' along w/ the timestamp
data).
2. Assign OID to each of the timezones and store it w/ the timestamp.

The first option seem to avoid the necessity of creating a new system
catalog for timezone information and the burden of updating it,
because current implementation is already capable of translating
abbreviations to useful timezone information. The question is, whether
just a TZ abbreviation is sufficient to uniquely identify the timezone
and get the offset and DST rules. If it's not sufficient, how
conflicting TZ short names are handled in the current code (i.e. 'AT
TIME ZONE ...')?

The second choice doesn't avoids the issue of ambiguous names,
although it requires moving TZ information inside the database and
providing some means to update it. There were mentions of potential
problems w/ pg_upgrade and pg_dump, if we add a massive amount of
oids for the timezones. What are these problems specifically?

I'd thing that storing TZ abbreviations is more straightforward and
easier to implement, unless there are too ambiguous to identify the
timezone correctly.

> Note that I am currently proposing to store only the zone
> names in the catalog, not the full TZ data.

Where would we store other bits of timezone information? Wouldn't it
be inconvenient to update names in system catalogs and DST rules
elsewhere?

Alexey.

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Stuart Bishop <stuart(at)stuartbishop(dot)net>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-08 15:13:17
Message-ID:	CADmi=6NzRb3UXyrxKYefrPZkPxYC9o-o=K416DNhY2kW=B7tnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
>> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>>
>>> I'm torn between whether the type should store the original time or the original time converted to GMT.
>>
>> This is the wrong way to think about it. We *never* store time
>> "converted to GMT". When we want to represent a point in time we
>> represent it as seconds since the epoch.
> Right. Sorry, my bad.
>
>> The question here is how to represent more complex concepts than
>> simply points in time. I think the two concepts under discussion are
>> a) a composite type representing a point in time and a timezone it
>> should be interpreted in for operations and display and b) the
>> original input provided which is a text string with the constraint
>> that it's a valid input which can be interpreted as a point in time.
>
> My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now.

The common problem is daylight savings time being declared or
cancelled. This happens numerous times throughout the year, often with
short notice.

If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that
to keep meaning 6pm Fiji time no matter what decisions the Fijian
government makes over the next two years, you need to store the
wallclock (local) time and the timezone. The wallclock time remains
fixed, but the conversion to UTC may float.

If you are storing an point in time that remains stable no matter
future political decisions, you store UTC time and an offset. The
conversion to wallclock time may float, and your 6pm Fiji time meeting
might change to 5pm or 7pm depending on the policical edicts.

If you are only storing past events, its not normally an issue but
timezone information does occasionally get changed retroactively if
errors are discovered.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-17 20:26:25
Message-ID:	245C8D60-8C50-46DA-85D5-9446F7C4BFD1@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 6, 2011, at 9:24 AM, Alexey Klyukin wrote:
> So, I'd think there are 2 reasonable approaches to storing the
> timezone part:
>
> 1. Store the timezone abbreviation (i.e. 'EST' along w/ the timestamp
> data).
> 2. Assign OID to each of the timezones and store it w/ the timestamp.
>
> The first option seem to avoid the necessity of creating a new system
> catalog for timezone information and the burden of updating it,
> because current implementation is already capable of translating
> abbreviations to useful timezone information. The question is, whether
> just a TZ abbreviation is sufficient to uniquely identify the timezone
> and get the offset and DST rules. If it's not sufficient, how
> conflicting TZ short names are handled in the current code (i.e. 'AT
> TIME ZONE ...')?

It's not enough. See the timezone_abbreviations setting.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stuart Bishop <stuart(at)stuartbishop(dot)net>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-18 05:29:37
Message-ID:	CA+TgmoZhPmpqGRMtEbK8oTmwCecvf0XRkXKTY9Y5OimgrDyt7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jul 8, 2011 at 11:13 AM, Stuart Bishop <stuart(at)stuartbishop(dot)net> wrote:
> On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
>>> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>>>
>>>> I'm torn between whether the type should store the original time or the original time converted to GMT.
>>>
>>> This is the wrong way to think about it. We *never* store time
>>> "converted to GMT". When we want to represent a point in time we
>>> represent it as seconds since the epoch.
>> Right. Sorry, my bad.
>>
>>> The question here is how to represent more complex concepts than
>>> simply points in time. I think the two concepts under discussion are
>>> a) a composite type representing a point in time and a timezone it
>>> should be interpreted in for operations and display and b) the
>>> original input provided which is a text string with the constraint
>>> that it's a valid input which can be interpreted as a point in time.
>>
>> My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now.
>
> The common problem is daylight savings time being declared or
> cancelled. This happens numerous times throughout the year, often with
> short notice.
>
> If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that
> to keep meaning 6pm Fiji time no matter what decisions the Fijian
> government makes over the next two years, you need to store the
> wallclock (local) time and the timezone. The wallclock time remains
> fixed, but the conversion to UTC may float.
>
> If you are storing an point in time that remains stable no matter
> future political decisions, you store UTC time and an offset. The
> conversion to wallclock time may float, and your 6pm Fiji time meeting
> might change to 5pm or 7pm depending on the policical edicts.

This is a pretty good point. You would want the first of these
(probably) for the time a meeting is scheduled to start, and the
second for the time of a meteor shower (or some other natural event
that doesn't care what the politicians decide).

I feel like the second of these is pretty well handled by our existing
timestamptz data type. Granted, you can't store the intended display
time zone, but putting it in a separate column is not really a
problem: at least, it has the right semantics. So maybe the first is
the one we should be aiming at. If so, storing a counter and a time
zone is the wrong approach: you need to record something like <year,
month, day, hour, minute, second, tzname>.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Stuart Bishop <stuart(at)stuartbishop(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Herrera Alvaro <alvherre(at)commandprompt(dot)com>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-18 22:29:51
Message-ID:	AD161118-B0D6-4AEA-8703-0D55BF2E8611@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 18, 2011, at 12:29 AM, Robert Haas wrote:
> On Fri, Jul 8, 2011 at 11:13 AM, Stuart Bishop <stuart(at)stuartbishop(dot)net> wrote:
>> On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>> On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
>>>> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>>>>
>>>>> I'm torn between whether the type should store the original time or the original time converted to GMT.
>>>>
>>>> This is the wrong way to think about it. We *never* store time
>>>> "converted to GMT". When we want to represent a point in time we
>>>> represent it as seconds since the epoch.
>>> Right. Sorry, my bad.
>>>
>>>> The question here is how to represent more complex concepts than
>>>> simply points in time. I think the two concepts under discussion are
>>>> a) a composite type representing a point in time and a timezone it
>>>> should be interpreted in for operations and display and b) the
>>>> original input provided which is a text string with the constraint
>>>> that it's a valid input which can be interpreted as a point in time.
>>>
>>> My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now.
>>
>> The common problem is daylight savings time being declared or
>> cancelled. This happens numerous times throughout the year, often with
>> short notice.
>>
>> If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that
>> to keep meaning 6pm Fiji time no matter what decisions the Fijian
>> government makes over the next two years, you need to store the
>> wallclock (local) time and the timezone. The wallclock time remains
>> fixed, but the conversion to UTC may float.
>>
>> If you are storing an point in time that remains stable no matter
>> future political decisions, you store UTC time and an offset. The
>> conversion to wallclock time may float, and your 6pm Fiji time meeting
>> might change to 5pm or 7pm depending on the policical edicts.
>
> This is a pretty good point. You would want the first of these
> (probably) for the time a meeting is scheduled to start, and the
> second for the time of a meteor shower (or some other natural event
> that doesn't care what the politicians decide).
>
> I feel like the second of these is pretty well handled by our existing
> timestamptz data type. Granted, you can't store the intended display
> time zone, but putting it in a separate column is not really a
> problem: at least, it has the right semantics. So maybe the first is
> the one we should be aiming at. If so, storing a counter and a time
> zone is the wrong approach: you need to record something like <year,
> month, day, hour, minute, second, tzname>.

Right; you need a timestamp and you need to know what timezone that timestamp was entered in. That means you can always convert that time to whatever timezone you'd like (like timestamptz), but you also know what time was originally entered, and what timezone it was entered in. Technically you can do that with a separate field, but that seems really ugly to me.

So what we're proposing is a new data type that stores a timestamp as well as the timezone that that time was originally entered in. We can't just store a 3 letter timezone abbreviation, because the mapping from 3 letter TZs to actual TZs is not fixed (see the timezone_abbreviations GUC). I believe the best way to handle this is a system table that stores the name of every timezone that the database has ever loaded from the timezone data files, along with an OID. That means that the storage for this is just a timestamp and an OID.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: storing TZ along timestamps
Date:	2011-07-18 22:37:15
Message-ID:	4E24B59B.5020909@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim,

> Right; you need a timestamp and you need to know what timezone that timestamp was entered in. That means you can always convert that time to whatever timezone you'd like (like timestamptz), but you also know what time was originally entered, and what timezone it was entered in. Technically you can do that with a separate field, but that seems really ugly to me.

I disagree. It's a good mapping of the actual data.

The timestamp and the timezone in which that timestamp was entered are
two separate pieces of data and *ought* to be in two separate fields.
For one thing, the question of "what timezone was this entered in" is an
application-specific question, since you have three different potential
timezones:

* the actual client timezone
* the actual server timezone
* the application timezone if the application has configurable timezones

In a builtin data type, which of those three would you pick? Only the
application knows.

Additionally, if you have your timestamp-with-original-timezone data
type, then you're going to need to recode every single
timestamp-handling function and operator to handle the new type.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 02:22:30
Message-ID:	1311029571-sup-2590@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Josh Berkus's message of lun jul 18 18:37:15 -0400 2011:

> The timestamp and the timezone in which that timestamp was entered are
> two separate pieces of data and *ought* to be in two separate fields.
> For one thing, the question of "what timezone was this entered in" is an
> application-specific question, since you have three different potential
> timezones:
>
> * the actual client timezone
> * the actual server timezone
> * the application timezone if the application has configurable timezones
>
> In a builtin data type, which of those three would you pick? Only the
> application knows.

I think this whole discussion is built on the assumption that the client
timezone and the application timezone are one thing and the same; and
the server timezone is not relevant at all. If the app TZ is not the
client TZ, then the app will need fixed.

> Additionally, if you have your timestamp-with-original-timezone data
> type, then you're going to need to recode every single
> timestamp-handling function and operator to handle the new type.

I have my doubts about that, and I hope not. These details haven't been
discussed at all; I only started this thread to get community approval
on cataloguing the TZs.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 14:24:55
Message-ID:	4E254D67020000250003F4D5@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> The timestamp and the timezone in which that timestamp was entered
> are two separate pieces of data and *ought* to be in two separate
> fields.

So, if you're grabbing a timestamp and the time zone for it, how do
you ensure you've done that atomically if you're at the boundary of
a DST change? The difficulty of grabbing both such that they are
guaranteed to correspond suggests to me that they really form a
single logical value.

> For one thing, the question of "what timezone was this entered in"
> is an application-specific question, since you have three
> different potential timezones:
>
> * the actual client timezone
> * the actual server timezone
> * the application timezone if the application has configurable
> timezones
>
> In a builtin data type, which of those three would you pick?

Well clearly the only one *PostgreSQL* would "pick" is one assigned
within the database server; otherwise, for a data type like this the
value coming over the wire should specify it.

If I want the client side value (in Java) it's easy enough to get
such a value. "new GregorianCalendar()" is described thusly:

| Constructs a default GregorianCalendar using the current time in
| the default time zone with the default locale.

How does Java assign those defaults? Why should PostgreSQL care?
It's got the means to do so for itself. The point is, people can
easily establish such a value on the client side; why not on the
server side?

> Only the application knows [whether it should pick the value or
> let the database pick it].

When are things otherwise? Obviously the application will assign it
or choose to let the server assign it (if that's the right thing).

> Additionally, if you have your timestamp-with-original-timezone
> data type, then you're going to need to recode every single
> timestamp-handling function and operator to handle the new type.

Why? I think you'd want to add some *new* casts and operators for
the new data type; I don't see why any existing ones would need to
be modified.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 15:37:16
Message-ID:	23272.1311089836@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> The timestamp and the timezone in which that timestamp was entered
>> are two separate pieces of data and *ought* to be in two separate
>> fields.

> So, if you're grabbing a timestamp and the time zone for it, how do
> you ensure you've done that atomically if you're at the boundary of
> a DST change?

In my view of the world, the timezone that you are in is not an object
that changes across a DST boundary. So the above is a red herring.
It is only a problem if you insist on a broken concept of what a
timezone is.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 16:11:37
Message-ID:	4E256669020000250003F4E7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> The timestamp and the timezone in which that timestamp was
>>> entered are two separate pieces of data and *ought* to be in two
>>> separate fields.
>
>> So, if you're grabbing a timestamp and the time zone for it, how
>> do you ensure you've done that atomically if you're at the
>> boundary of a DST change?
>
> In my view of the world, the timezone that you are in is not an
> object that changes across a DST boundary.

You're right -- the moment in time should be fixed like in the
current PostgreSQL "timestamp with time zone", and the time zone
doesn't change with DST. Not an intentional read herring, but
definitely some muddy thinking there.

That weakens the argument for such a data type. Even with that, I
suspect that its value as a convenience for application programmers
would be sufficient that an extension to provide such functionality
would get used. Essentially the current timestamptz bundled with a
time zone and which is, by default, displayed "at time zone" of the
attached time zone on output.

-Kevin

From:	Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 16:22:55
Message-ID:	CADCBc=hBBfcm=8fWrVP0EQTwQ-iqH6M-+T0m22ZN01ZKU9zggw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19 July 2011 17:11, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>>> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>>> The timestamp and the timezone in which that timestamp was
>>>> entered are two separate pieces of data and *ought* to be in two
>>>> separate fields.
>>
>>> So, if you're grabbing a timestamp and the time zone for it, how
>>> do you ensure you've done that atomically if you're at the
>>> boundary of a DST change?
>>
>> In my view of the world, the timezone that you are in is not an
>> object that changes across a DST boundary.
>
> You're right -- the moment in time should be fixed like in the
> current PostgreSQL "timestamp with time zone", and the time zone
> doesn't change with DST. Not an intentional read herring, but
> definitely some muddy thinking there.

There was an earlier point made that if someone puts eg 5pm local time
two years in the future into the database, and then the DST boundary
gets moved subsequently, some applications would like the value to
still say 5pm local time, even though that means it now refers to a
different point in absolute time - this potentially seems like a
useful feature. Retroactive timezone changes wouldn't make a lot of
sense in this case though...

I guess there are three concepts of time here - an absolute fixed time
with no reference to a timezone, a time with a timezone that is still
set as a fixed point in time, or a local time in a specific timezone
that would move if the timezone definition changed.

Ian

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 21:06:58
Message-ID:	4E25F1F2.1000807@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro, Kevin,

>> In a builtin data type, which of those three would you pick? Only the
>> application knows.
>
> I think this whole discussion is built on the assumption that the client
> timezone and the application timezone are one thing and the same; and
> the server timezone is not relevant at all. If the app TZ is not the
> client TZ, then the app will need fixed.

Not at all. Consider a hosted webapp where the user is allowed to set
their own timezone, but you use pooled connections. In that case, the
app is going to be handling user timezones with an AT TIME ZONE, not
with a SET TIMEZONE=""

> I have my doubts about that, and I hope not. These details haven't been
> discussed at all; I only started this thread to get community approval
> on cataloguing the TZs.

I am strongly in favor of having a *timezone* data type and some system
whereby we can uniquely identify timezones in the Zic database. That
would be tremendously useful for all sorts of things. I'm just
asserting that those who want a composite timestamp+saved-time-zone data
type have not thought about all of the complications involved.

> So, if you're grabbing a timestamp and the time zone for it, how do
> you ensure you've done that atomically if you're at the boundary of
> a DST change? The difficulty of grabbing both such that they are
> guaranteed to correspond suggests to me that they really form a
> single logical value.

Not relevant, given that (hopefully) the conception of a time zone
should exist independantly of whether it's currently in DST or not.
That is, the time zone is NOT "-07". The time zone is "US/Pacific".

> Why? I think you'd want to add some *new* casts and operators for
> the new data type; I don't see why any existing ones would need to
> be modified.

That would work too. What I'm pointing out is that we can't implement
the new type using just one-line modifications to the old operators and
functions.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-19 21:20:55
Message-ID:	CABFD9BF-10EA-49C3-807A-4573E14A6E26@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 19, 2011, at 2:06 PM, Josh Berkus wrote:

> I am strongly in favor of having a *timezone* data type and some system
> whereby we can uniquely identify timezones in the Zic database.

CREATE OR REPLACE FUNCTION is_timezone(
tz CITEXT
) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$
BEGIN
PERFORM NOW() AT TIME ZONE tz;
RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
RETURN FALSE;
END;
$$;

CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) );

Best,

David

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 21:39:23
Message-ID:	0D1368A8-B01B-43CC-A8F9-E7591FDE1B43@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 19, 2011, at 4:20 PM, David E. Wheeler wrote:
> On Jul 19, 2011, at 2:06 PM, Josh Berkus wrote:
>
>> I am strongly in favor of having a *timezone* data type and some system
>> whereby we can uniquely identify timezones in the Zic database.
>
> CREATE OR REPLACE FUNCTION is_timezone(
> tz CITEXT
> ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$
...
> CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) );

Storing giant globs of text with every timestamp field is really ugly.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 21:44:14
Message-ID:	C6631462-901B-4C26-8C71-D38C65865DB7@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote:
> On 19 July 2011 17:11, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>>>> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>>>> The timestamp and the timezone in which that timestamp was
>>>>> entered are two separate pieces of data and *ought* to be in two
>>>>> separate fields.
>>>
>>>> So, if you're grabbing a timestamp and the time zone for it, how
>>>> do you ensure you've done that atomically if you're at the
>>>> boundary of a DST change?
>>>
>>> In my view of the world, the timezone that you are in is not an
>>> object that changes across a DST boundary.
>>
>> You're right -- the moment in time should be fixed like in the
>> current PostgreSQL "timestamp with time zone", and the time zone
>> doesn't change with DST. Not an intentional read herring, but
>> definitely some muddy thinking there.
>
> There was an earlier point made that if someone puts eg 5pm local time
> two years in the future into the database, and then the DST boundary
> gets moved subsequently, some applications would like the value to
> still say 5pm local time, even though that means it now refers to a
> different point in absolute time - this potentially seems like a
> useful feature. Retroactive timezone changes wouldn't make a lot of
> sense in this case though...

Right; and timezone's aren't supposed to change retroactively. The ZIC database is specifically setup so that it knows the history of TZ changes and deals with the past correctly.

> I guess there are three concepts of time here - an absolute fixed time
> with no reference to a timezone, a time with a timezone that is still
> set as a fixed point in time, or a local time in a specific timezone
> that would move if the timezone definition changed.

Or, another way to put the third class: a timestamp that remembers what it's original timezone was so that you can refer to it a common timezone (such as UTC), OR you can refer to it at it's original, local time. That's our exact need for this: we have different businesses that operate in different timezones. Generally, we only care about things in local time, but there are cases (such as event logging) where we could care about local *OR* unified time.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 21:44:17
Message-ID:	C4B20998-C9F2-434B-BEBD-56310C7BDE59@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 21, 2011, at 2:39 PM, Jim Nasby wrote:

>> CREATE OR REPLACE FUNCTION is_timezone(
>> tz CITEXT
>> ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$
> ...
>> CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) );
>
> Storing giant globs of text with every timestamp field is really ugly.

You work with what you've got.

David

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 21:48:27
Message-ID:	FAF060BA-CB21-4C1B-9484-D5D96370FDA3@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote:
>> I have my doubts about that, and I hope not. These details haven't been
>> discussed at all; I only started this thread to get community approval
>> on cataloguing the TZs.
>
> I am strongly in favor of having a *timezone* data type and some system
> whereby we can uniquely identify timezones in the Zic database. That
> would be tremendously useful for all sorts of things. I'm just
> asserting that those who want a composite timestamp+saved-time-zone data
> type have not thought about all of the complications involved.

Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezone datatype there's at least the possibility of using a composite type to deal with all of this. Or at least we can just create a custom datatype using existing tools... the only part of this that I see that actually requires closer core support is the timezone data itself.

So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data type as an add-on (at least assuming we don't run into any gotchas).

Alvaro, please speak up if there's any technical issues here that I've missed?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 21:52:57
Message-ID:	21508.1311285177@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote:
>> There was an earlier point made that if someone puts eg 5pm local time
>> two years in the future into the database, and then the DST boundary
>> gets moved subsequently, some applications would like the value to
>> still say 5pm local time, even though that means it now refers to a
>> different point in absolute time - this potentially seems like a
>> useful feature. Retroactive timezone changes wouldn't make a lot of
>> sense in this case though...

> Right; and timezone's aren't supposed to change retroactively. The ZIC database is specifically setup so that it knows the history of TZ changes and deals with the past correctly.

You haven't noticed that at least two or three times a year, there are
"historical corrections" in the ZIC database? The mapping between local
time and UTC might be less likely to change for a time instant in the
past than one in the future, but it would be folly to assume that it's
immutable in either direction.

regards, tom lane

From:	Christopher Browne <cbbrowne(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 22:30:48
Message-ID:	CAFNqd5W5smcK-TjN0BNWtxk2PyK4=1kDa5GbJT44cLZOfpmQ4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 5:48 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote:
>>> I have my doubts about that, and I hope not. These details haven't been
>>> discussed at all; I only started this thread to get community approval
>>> on cataloguing the TZs.
>>
>> I am strongly in favor of having a *timezone* data type and some system
>> whereby we can uniquely identify timezones in the Zic database. That
>> would be tremendously useful for all sorts of things. I'm just
>> asserting that those who want a composite timestamp+saved-time-zone data
>> type have not thought about all of the complications involved.
>
> Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezone datatype there's at least the possibility of using a composite type to deal with all of this. Or at least we can just create a custom datatype using existing tools... the only part of this that I see that actually requires closer core support is the timezone data itself.
>
> So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data type as an add-on (at least assuming we don't run into any gotchas).

As I have been watching this whole thread, my inclination has been to
look at this from a "Prolog" perspective, where we think about the
database as indicating a series of assertions about facts, from which
we then try to reason.

I suspect that determining what *really* needs to get recorded depends
on this. And it seems to me that trying to head down the path of
defining oid-based lookups of timezone names may be putting the cart
before the horse.

There are a number of facts about a timestamp:

1. What time did the database server think it was?

SELECT NOW();

captures the database's concept of what time it was, complete with:
a) The time, based, I think, on UT1. With the caveat that there's no
certainty that the DB server's time is necessarily correct.
b) An encoding of the timezone offset based on the value of the
TimeZone GUC for this connection.

If one is running an NTP daemon, pointing to a decently-connected
network of NTP servers, then it's likely that this time is pretty
accurate. And most of the time, I'd be inclined to treat this as
authoritative, and contend that anything else is likely to be less
correct and less easy to work with.

The goal of this discussion thread is to record another timestamp with
a different basis. It's not entirely clear what is its basis. I'll
suggest one, which mostly underlines my contention that it's likely
"less correct" and "less easy to work with" than having a column
defined as...
some_timestamp timestamp with timezone default NOW()

2. Client-based timestamp, comprising two things:

a) A time, ascertained by the client.
b) A timezone, ascertained by the client.

Note that timezones are pretty open-ended. There is an authoritative
encoding defined in the tz database, but there are other values used
out there. We had to patch Slony-I to have it use 'ISO' timestamps,
and recommend running in GMT/UTC, because there are values that blow
things up.

For instance, on AIX, there is a habit for boxes to set TZ=CUT0, out
of the box, which isn't on what PostgreSQL considers to be the
"official list."

On the more whimsical side of things, Joey Hess, a Debian developer
noted for such things as ikiwiki, etckeeper, git-annex, decided to
create his very own custom timezone, "JEST", because he was irritated
about DST.

http://kitenet.net/~joey/blog/entry/howto_create_your_own_time_zone/
http://kitenet.net/~joey/blog/entry/JEST_results/

That "whimsical" entry won't be going into tzdata, and while we could
discount this case as whimsy, it's not "out there" for organizations
such as nation states to decide to legislate their own things, that we
can't be certain will necessarily get into tzdata.

There are enough aliases and possibilities of local national decisions
to make it at least somewhat troublesome to treat this as something
that can be considered fixed down to the OID level.

My conclusion would be that if someone is really, really, really keen
on capturing their own notion of timezone, then this fits with the
notion that, if they want to have something that could be treated as
remotely authoritative, they should capture a multiplicity of pieces
of datestamp information, and actively accept that this will be pretty
duplicative.

- I'd commend capturing NOW() in a timestamptz field. That gives you:
1. What time the DB server thought it was, in terms of UT1
2. What timezone it thought was tied to that connection.
- Also, I'd be inclined to capture, in plain text form:
3. A client-recorded timestamp. I'm agnostic as to whether this has
*any* validation done on it; I'd think it plausible that this is
simply a text field, that might require a human to interpret it.
4. A client-recorded timezone. This would be a plain text field, and
I'm not certain it's of any particular value to try to validate it
against any would-be authoritative list. Why shouldn't Joey be able
to use JEST?
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-21 23:05:36
Message-ID:	1F4E5F93-3241-4420-B54C-5A368860FF69@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote:
> - I'd commend capturing NOW() in a timestamptz field. That gives you:
> 1. What time the DB server thought it was, in terms of UT1
> 2. What timezone it thought was tied to that connection.

Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were using when we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone *was* at that time, we have no way to go back to it.

Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we should store the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless without a good way to remember what timezone it started out in.

Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what the database has available in it's ZIC database.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-22 15:33:09
Message-ID:	CA+TgmoYu9jCZQ7mGuKEjxwWNKfs9oJYs=0Z3qHaKOZ_k+VEAiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 7:05 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote:
>> - I'd commend capturing NOW() in a timestamptz field. That gives you:
>> 1. What time the DB server thought it was, in terms of UT1
>> 2. What timezone it thought was tied to that connection.
>
> Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were using when we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone *was* at that time, we have no way to go back to it.
>
> Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we should store the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless without a good way to remember what timezone it started out in.
>
> Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what the database has available in it's ZIC database.

You have those same problems trying to include the time zone
information in some new timestampreallyhasthetz data type, though.

This problem reminds me a great deal of the problems associated with
managing security labels for SE-Linux. There aren't that many
distinct values, so ideally it would be nice to store an OID -> string
mapping somewhere and just store the OIDs in the main table. But a
new security label can appear at any time, and it doesn't work to have
the transaction that discovers it do the insert into the mapping
table. Time zones have the same problem, more or less. Now, maybe if
we had non-transactional tables like Alvaro keeps muttering about...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-22 15:51:03
Message-ID:	1311349760-sup-7761@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of vie jul 22 11:33:09 -0400 2011:

> This problem reminds me a great deal of the problems associated with
> managing security labels for SE-Linux. There aren't that many
> distinct values, so ideally it would be nice to store an OID -> string
> mapping somewhere and just store the OIDs in the main table. But a
> new security label can appear at any time, and it doesn't work to have
> the transaction that discovers it do the insert into the mapping
> table. Time zones have the same problem, more or less. Now, maybe if
> we had non-transactional tables like Alvaro keeps muttering about...

Oh, I wasn't relating these problems to non transactional tables ...
Hmm.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-22 21:54:42
Message-ID:	4E29F1A2.9080705@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim,

> Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what the database has available in it's ZIC database.

Sure, although there's no getting around the portability issues. The
moment you move that data between servers, you risk having specific
timezones not be available on the second server. Or worse, be available
but have a different definition -- if, for example, you're running a
more/less updated PostgreSQL on the second server.

I'm not saying that this isn't worth solving. I could really use a
timezone datatype which was synched with ZIC in some way, and so could a
lot of other users, whether or not a timestamp + original timezone type
is available as well. But don't underestimate the scope of the problem.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-25 22:26:02
Message-ID:	847D5B86-0206-45D5-AEF0-BF6EE06E4A37@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 22, 2011, at 10:33 AM, Robert Haas wrote:
> On Thu, Jul 21, 2011 at 7:05 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote:
>>> - I'd commend capturing NOW() in a timestamptz field. That gives you:
>>> 1. What time the DB server thought it was, in terms of UT1
>>> 2. What timezone it thought was tied to that connection.
>>
>> Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were using when we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone *was* at that time, we have no way to go back to it.
>>
>> Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we should store the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless without a good way to remember what timezone it started out in.
>>
>> Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what the database has available in it's ZIC database.
>
> You have those same problems trying to include the time zone
> information in some new timestampreallyhasthetz data type, though.
>
> This problem reminds me a great deal of the problems associated with
> managing security labels for SE-Linux. There aren't that many
> distinct values, so ideally it would be nice to store an OID -> string
> mapping somewhere and just store the OIDs in the main table. But a
> new security label can appear at any time, and it doesn't work to have
> the transaction that discovers it do the insert into the mapping
> table. Time zones have the same problem, more or less. Now, maybe if
> we had non-transactional tables like Alvaro keeps muttering about...

Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload?

Josh Berkus also made a good point that this does introduce the risk that you could end up moving data to a different server, that has a different ZIC database (perhaps via replication); at which point the fit could hit the shan (or the excrement could impact the cooling device...). So perhaps the only reasonable way to handle this is to actually load ZIC data into the database itself.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-25 23:22:36
Message-ID:	CA+TgmoYivPhmhM=5tmf239Gz4dEAt_SeHuQ9cOcnzYvipVaX8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload?

Hmm. That might work in theory, but I don't see any good way to
update every database's tz table on each reload.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-26 22:45:57
Message-ID:	36920C89-358A-4901-8AC0-D00EDFC8C31F@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
> On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload?
>
> Hmm. That might work in theory, but I don't see any good way to
> update every database's tz table on each reload.

I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do.

Or are you seeing a problem I'm missing?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Christopher Browne <cbbrowne(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-26 22:56:46
Message-ID:	CAFNqd5VYkgYtUe_0CAvHOr8tdFjqJBdUknNOyF2wgHGZyn=GKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 26, 2011 at 6:45 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
>> On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload?
>>
>> Hmm. That might work in theory, but I don't see any good way to
>> update every database's tz table on each reload.
>
> I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do.
>
> Or are you seeing a problem I'm missing?

What if 4 backends concurrently are the "first ones" to try to
simultaneously add "South America/Ruritania", and...

1. Connection #1 came in 'first', but rolls back its transaction.
2. Connection #2 came in 'second', and also winds up rolling back its
transaction because the connection fails due to a network problem.
3. Connection #3 actually completes. But doesn't commit until after #4.
4. Connection #4 started last, but turns out to COMMIT first.

The "merge" is a pretty bad one. They all have to try to succeed, and
in some way that doesn't block things.

Perhaps the TZ values need to not be UNIQUE, but some process can come
in afterwards and rewrite to drop out non-unique values. That's not
very nice either; that means you can't use a FK reference against the
TZ table.

Or you need to have something that comes in afterwards and repoints
tuples to the *real* TZ entry, which seems likely to be troublesome.

This just gets so ugly so fast; the attempt to save space by storing a
pointer to the TZ value is just filled with trouble (and potential
#fail).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: storing TZ along timestamps
Date:	2011-07-26 23:30:11
Message-ID:	CDCC2614-6AFB-4807-8181-753E7F7FD046@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 26, 2011, at 5:56 PM, Christopher Browne wrote:
>> I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do.
>>
>> Or are you seeing a problem I'm missing?
>
> What if 4 backends concurrently are the "first ones" to try to
> simultaneously add "South America/Ruritania", and...
>
> 1. Connection #1 came in 'first', but rolls back its transaction.
> 2. Connection #2 came in 'second', and also winds up rolling back its
> transaction because the connection fails due to a network problem.
> 3. Connection #3 actually completes. But doesn't commit until after #4.
> 4. Connection #4 started last, but turns out to COMMIT first.

Ugh, I didn't realize that a reload would take effect in the middle of a transaction. That certainly kills what I proposed.

Though, now I'm wondering why this info would need to be in every database anyway... certainly this should be treated as global data, and if that's the case then only one process needs to update it. Though I'm not sure if it's possible for global data to be ACID.

Anyway, rather than continue this on-list, I'm going to get Alvaro to think about it in more detail and see what he comes up with.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net