Re: Timezones (in 8.5?)

Lists: pgsql-hackers
From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Timezones (in 8.5?)
Date: 2009-11-17 15:21:14
Message-ID: 48692c2d0911170721h50f40141m7a1dc991258ef07a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Are there any plans to (is anybody working on) implement better
timezone support in postgresql
for 8.5 ? Specifically, store the timezone info -instead of just the
timestamp as UTC ?
http://wiki.postgresql.org/wiki/Todo#Dates_and_Times

Hernán J. González


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: hernan gonzalez <hgonzalez(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-17 15:48:19
Message-ID: 603c8f070911170748m219d7901raba0a44283827d74@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 17, 2009 at 10:21 AM, hernan gonzalez <hgonzalez(at)gmail(dot)com> wrote:
> Are there any plans to (is anybody working on) implement better
> timezone support in postgresql
> for 8.5 ? Specifically, store the timezone info -instead of just the
> timestamp as UTC ?
> http://wiki.postgresql.org/wiki/Todo#Dates_and_Times

You might want to use the word "different" rather than the word
"better", because the current behavior is quite useful and I think
many people would be unhappy if it were to go away.

I think there's also some debate about whether we want this at all. See here:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00964.php

One random thought - I am not aware that we currently have a "time
zone" type in which to store a time zone in. Is there any value in
having such a thing vs. just using varchar?

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: hernan gonzalez <hgonzalez(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-17 16:13:39
Message-ID: 29475.1258474419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> One random thought - I am not aware that we currently have a "time
> zone" type in which to store a time zone in. Is there any value in
> having such a thing vs. just using varchar?

The main potential advantage seems to be faster lookup of the zone's
associated data ... but I think we already keep the data in a hashtable
indexed by hash of the zone name, so the gain might be pretty marginal.

A specialized type *might* provide some notational advantage for writing
operators, eg maybe "timestamp @ zone" would be sensible. But this is
speculative without some clearer idea of what operations you'd want.
And anyway it's not clear that text wouldn't work just as well there.

Perhaps the OP should explain exactly what real-world problems he's
trying to solve. As noted in the discussion you linked, there's not
a lot of enthusiasm around here for getting closer to the spec's
datetime handling simply because it's the spec; that part of the spec
is just too broken for that to be a credible argument.

regards, tom lane


From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-17 20:31:54
Message-ID: 48692c2d0911171231h6ab16a64yc4db35a6e26909e0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Perhaps the OP should explain exactly what real-world problems he's
> trying to solve.  As noted in the discussion you linked, there's not
> a lot of enthusiasm around here for getting closer to the spec's
> datetime handling simply because it's the spec; that part of the spec
> is just too broken for that to be a credible argument.

I'm not much interested in the compliance with the ANSI SQL spec, I
agree in this regard it is unsatisfactory (to put it midly).
But I'm also disatisfied with the current Postgresql implementation,
the types TIMESTAMP and TIMESTAMP WITH TIMEZONE are in the middle of
being SQL compliant and being really useful. The support of timezones
is really crippled now.

I understand, though, that backward compatibily is critical, and I'm
surely unaware of many implementation issues.
Anyway (long rambling follows - and excuse my english)...

We know that, even ignoring ANSI spec and postgresql compatibility for
one moment, even before considering date-time arithmetic and DTS
issues, date-time handling is notoriously difficult to formalize
satisfactorily. And, come to look at it, it's not a Postgresql
problem, nor a SQL problem: I believe there is NO standard for
store/serialize/represent a "date-time value", with all the
complexities that the concept has in human usage (ISO 8601, as
ANSI-SQL, just considers GMT offsets, not real timezones).

Let me present a simple real world scenario -to look at not from the
implementation point of view, but from the user:

- John records in his calendar a reminder for some event at datetime
2010-Jul-27, 10:30:00, with TZ "Chile/Santiago", (GMT+4 hence it
corresponds to UTC time 2010-Jul-27 14:30:00). But some days
afterwards, his government decides to change the country TZ to GMT+5.

Now, when the day comes... should that reminder trigger at
A) 2010-Jul-27 10:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 15:30:00
or
B) 2010-Jul-27 9:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 14:30:00 ?

There is no correct answer, unless one knows what John actually meant
when he said "please ring me at "2010-Jul-27, 10:30:00
TZ=Chile/Santiago"
Did he mean a "civil date-time" ("when the clocks in my city tell
10:30")? In that case, A) is the correct answer.
Or did he mean a "physical instant of time", a point in the continuus
line of time of our universe, say, "when the next solar eclipse
happens". In that case, answer B) is the correct one.

I believe that this distinction between two realms: one related to
(say) "physical time" and the other to (say) "civil date-time", is the
key to put some order... conceptually, at least (I'm not speaking
about feasibility for now). This is the approach of some Date-Time
APIs, for example the "Joda" Java library
http://joda-time.sourceforge.net/ (headed to replace soon
https://jsr-310.dev.java.net/ the original ugly JDK Date-Calendar
API) and I believe it's the right way.

In this approach, we would have two entirely different types (or
family of types) -no castings allowed.
An "instant" is a "physical time", a point in the time continuum.
A "partial date time spec" (or "partial civil datetime") is just a
tuple of values {year,month,day, hour,min,sec,usec,TZ_id} some of
which might be empty/unspecified.
Conversion from "instant" to "civil datetime" is only allowed if a TZ
is also specified (well, also a "Calendar" spec, if non-gregorian
dates are to be dealt with).
Conversion from "partial civil datetime" to "instant" is only allowed
if all fields are non-empty (again, assuming a "Calendar").
Similar distintion goes for "intervals" or "durations".

Postgresql implementation (and ANSI-SQL), for all date-time data,
revolves around the "physical time" concept: that is what it is
ultimately stored, that's what it's tought as the "real thing" (the
rest are input/output and arithmetic issues).
(Rather disgressing: even the DATE type is treated as a point in time,
as a DateTime with time=00:00:00 ; I think this is bad, conceptually,
when I think of "2010-Jul-27" I think of a date, not of the instant of
time "2010-Jul-27 00:00:00", they are different concepts; this is NOT
analogous to INT 10 => FLOAT 10.0 )
Because of this (IMHO) conceptual limitation, the availabily of the
two types "TIMESTAMP" "TIMESTAMP WITH TIME ZONE" results,
unfortunately, much less useful than it could have been.

If I were to reimplement the date-time data types, without much
regarding ANSI-SQL standard and Postgresql compatibility (a little too
much to ask, I know) I'd propose:

TIMESTAMP: ("instante") just a point in time, purely physical (as it
name suggest!). UTC encoded.
(input format could accept unix time or standard datetime format, with
default/server TZ; output format could output explicit GMT offset, to
support dump/restore robustly)

DATETIME: (call it "TIMESTAMP WITH TIME ZONE" if you wish but... is a
very different thing)
a full "civil" date time specification {year,month,day,
hour,min,sec,usec,TZ} (Of course, internally it could be stored as UTC
+ TZ_id )
Can be converted to TIMESTAMP, (but no casting allowed!), but the
result might vary if the TZ tables are changed (see my example above).

LOCAL_DATETIME: a full "civil" date time specification
{year,month,day, hour,min,sec,usec} with NO TZ.
Cannot be converted to DATETIME (or TIMESTAMP), except if a TZ_id is
also specified.
(This type might be merged with DATETIME if we allow empty TZ_id
values; but NEVER assume the server TZ as default TZ when empty!)

DATE: just a civil "date" {year,month,date} . Can be converted to
LOCAL_DATIME only by adding the remaining fields.

And similar for intervals...

Of course, there are many implementation details (some "time zone"
type or codification - efficient caching of DATETIME
operations/conversions - lots of input/output formatting issues,
interfaces, dump/restore) - and above all, compatibily with pg and
ANSI. Anyway, I dream of seing Postgresql going this way :-)

Best regards

Hernán J. González
Buenos Aires, Argentina
http://hjg.com.ar/


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: hgonzalez(at)gmail(dot)com (hernan gonzalez), pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-18 04:21:35
Message-ID: 87d43geaud.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "hernan" == hernan gonzalez <hgonzalez(at)gmail(dot)com> writes:

>> Perhaps the OP should explain exactly what real-world problems
>> he's trying to solve.  As noted in the discussion you linked,
>> there's not a lot of enthusiasm around here for getting closer to
>> the spec's datetime handling simply because it's the spec; that
>> part of the spec is just too broken for that to be a credible
>> argument.

hernan> I'm not much interested in the compliance with the ANSI SQL
hernan> spec, I agree in this regard it is unsatisfactory (to put it
hernan> midly). But I'm also disatisfied with the current Postgresql
hernan> implementation, the types TIMESTAMP and TIMESTAMP WITH
hernan> TIMEZONE are in the middle of being SQL compliant and being
hernan> really useful. The support of timezones is really crippled
hernan> now.

Crippled how?

The example you gave is easily handled in pg as follows:

hernan> - John records in his calendar a reminder for some event at
hernan> datetime 2010-Jul-27, 10:30:00, with TZ "Chile/Santiago",
hernan> (GMT+4 hence it corresponds to UTC time 2010-Jul-27
hernan> 14:30:00). But some days afterwards, his government decides
hernan> to change the country TZ to GMT+5.

hernan> Now, when the day comes... should that reminder trigger at
hernan> A) 2010-Jul-27 10:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 15:30:00
hernan> or
hernan> B) 2010-Jul-27 9:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 14:30:00 ?

hernan> There is no correct answer, unless one knows what John
hernan> actually meant when he said "please ring me at "2010-Jul-27,
hernan> 10:30:00 TZ=Chile/Santiago" Did he mean a "civil date-time"
hernan> ("when the clocks in my city tell 10:30")? In that case, A)
hernan> is the correct answer. Or did he mean a "physical instant of
hernan> time", a point in the continuus line of time of our universe,
hernan> say, "when the next solar eclipse happens". In that case,
hernan> answer B) is the correct one.

If he meant (A), then you store the event as:
(ts,tz) = (timestamp '2010-07-27 10:30:00',
'Chile/Santiago')
and decide when it happens using (ts at time zone tz), evaluated on
the fly. This way, when you install an update in your zic database to
cope with the change of tz, the computed value of the physical time
changes, but it still shows the same calendar time.

If he meant (B), then you store the event as
(tsz,tz) = (timestamp '2010-07-27 10:30:00' at time zone 'Chile/Santiago',
'Chile/Santiago')
(note that tsz is now of type timestamp with time zone). This fixes the
physical time, and when you install the zic update, the displayed calendar
time changes, in order to keep the physical time the same.

If you're writing a calendaring app that wants to allow storing both kinds
of events (I've yet to see such an app that actually makes this distinction,
most seem to work on the assumption that timezones don't change), all the
tools for it are currently available in postgres.

--
Andrew (irc:RhodiumToad)


From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-18 13:50:44
Message-ID: 48692c2d0911180550m1727c0ebkdae4bf6bea443a22@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> hernan> The support of timezones is really crippled
>  hernan> now.
>
> Crippled how?

Well, among other things, no builtin date-timetype allows me to save
the timezone (or even the offset).
No type allows to treat this three datetimes as different values.
'2010-07-27 10:30 GMT+4' '2010-07-27 09:30 GMT+5' '2010-07-27 10:30 GMT+0'
The ANSI spec at least permits that.

> The example you gave is easily handled in pg as follows:

Well, using compound types one can handle practically everything...
My point is expressiveness. Basic datatypes should ideally correspond to
the most typical data that one which to store/retrive/manipulate in a DB.
And my claim is that most date-time values found in real life can be neatly
classified in the types I mentioned (basically: physical instants of time, or
civil date-times), and that they should not be confused.
Hence, for example an operation as [TIMESTAMP] + "1 MONTH" should
not be allowed (incompatible types).
Hence, when I ask PG "store the datetime "2010-Jul-27, 10:30:00 (at TZ
Chile/Santiago)",
it should not do (as today) "ok, let me check the zic table for that
TZ... aha, offset +4,
so you meant the UTC time 2010-Jul-27 14:30:00 ...saved" But I didn't
mean that, I meant
what I said (a civil date). The "bridging" (conversion to physical
time) should only be made
when (if) needed.
I also claim, BTW, that the DB should never rely on its local TZ. If
some SQL query (eg: select all
orders confirmed in January) can return different sets by changing the
TZ of the DB server,
something is wrong.

> If you're writing a calendaring app that wants to allow storing both kinds
> of events (I've yet to see such an app that actually makes this distinction,
> most seem to work on the assumption that timezones don't change), all the
> tools for it are currently available in postgres.
>

I'd said that calendar events are the most typical case of civil
date-times (most other
date-times, i think, are in fact timestamps, i.e. physical times: eg
when a record was created,
a blog post, etc). When I record an appointment with my dentist at
"9:30 (at my TZ)" I'm not thinking
of a point of time, but a civil date-time. PG does not me allow to
save (cleanly and robustly) such
a basic data item. You must resort to a compound type, and plug the
semantic yourself.
I think that, if the date-time types were more consistent and natural,
there would be
no need to make assumptions about timezones specifications can change or not,
the issue would not arise. (BTW, in my country the timezones indeed change, and
most unpredictably ,sadly; but that's not my motivation)

Hernán J. González
Buenos Aires, Argentina
http://hjg.com.ar/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "hernan gonzalez" <hgonzalez(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-18 18:26:14
Message-ID: 4B03E7E6020000250002C9E9@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:

> If he meant (A), then you store the event as:
> (ts,tz) = (timestamp '2010-07-27 10:30:00',
> 'Chile/Santiago')

> If he meant (B), then you store the event as
> (tsz,tz) = (timestamp '2010-07-27 10:30:00' at time zone
> 'Chile/Santiago', 'Chile/Santiago')

You seem to be agreeing that these problems can't be solved without
storing a time zone string in addition to the timestamp. As I read
it, Hernán was wishing for types which include this, rather than
having to do the above dance with multiple values.

-Kevin


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Kevin(dot)Grittner(at)wicourts(dot)gov ("Kevin Grittner"), "hernan gonzalez" <hgonzalez(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 04:18:19
Message-ID: 87my2jcgvi.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Kevin" == "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:

>> If he meant (A), then you store the event as:
>> (ts,tz) = (timestamp '2010-07-27 10:30:00',
>> 'Chile/Santiago')

>> If he meant (B), then you store the event as
>> (tsz,tz) = (timestamp '2010-07-27 10:30:00' at time zone
>> 'Chile/Santiago', 'Chile/Santiago')

Kevin> You seem to be agreeing that these problems can't be solved
Kevin> without storing a time zone string in addition to the
Kevin> timestamp. As I read it, Hernán was wishing for types which
Kevin> include this, rather than having to do the above dance with
Kevin> multiple values.

Right, but including more data in a single type is the wrong approach,
since it complicates the semantics and interferes with normalization.
For example, if you have a type T which incorporates a timestamp and a
timezone, what semantics does the T = T operator have? What semantics
apply if the definitions of timezones change? What if you're storing
times of events at specific places; in that case you want to associate
the timezone with the _place_ not the event (so that if the timezone
rules change, moving the place from one timezone to another, you only
have to change the place, not all the events that refer to it).

--
Andrew (irc:RhodiumToad)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, hernan gonzalez <hgonzalez(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 14:12:08
Message-ID: 603c8f070911190612v1a6a589cy9be1c2f4a6b94a63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 18, 2009 at 11:18 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Kevin" == "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>
>  >> If he meant (A), then you store the event as:
>  >> (ts,tz) = (timestamp '2010-07-27 10:30:00',
>  >> 'Chile/Santiago')
>
>  >> If he meant (B), then you store the event as
>  >> (tsz,tz) = (timestamp '2010-07-27 10:30:00' at time zone
>  >> 'Chile/Santiago', 'Chile/Santiago')
>
>  Kevin> You seem to be agreeing that these problems can't be solved
>  Kevin> without storing a time zone string in addition to the
>  Kevin> timestamp.  As I read it, Hernán was wishing for types which
>  Kevin> include this, rather than having to do the above dance with
>  Kevin> multiple values.
>
> Right, but including more data in a single type is the wrong approach,
> since it complicates the semantics and interferes with normalization.
> For example, if you have a type T which incorporates a timestamp and a
> timezone, what semantics does the T = T operator have? What semantics
> apply if the definitions of timezones change? What if you're storing
> times of events at specific places; in that case you want to associate
> the timezone with the _place_ not the event (so that if the timezone
> rules change, moving the place from one timezone to another, you only
> have to change the place, not all the events that refer to it).

Also, if someone DOES want to use these together, isn't that what
composite types are for?

...Robert


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: "hernan gonzalez" <hgonzalez(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 16:55:24
Message-ID: 4B05241F020000250002CA2F@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 18, 2009 at 11:18 PM, Andrew Gierth
> <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>>> "Kevin" == "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
writes:
>>
>> >> If he meant (A), then you store the event as:
>> >> (ts,tz) = (timestamp '2010-07-27 10:30:00',
>> >> 'Chile/Santiago')
>>
>> >> If he meant (B), then you store the event as
>> >> (tsz,tz) = (timestamp '2010-07-27 10:30:00' at time zone
>> >> 'Chile/Santiago', 'Chile/Santiago')
>>
>> Kevin> You seem to be agreeing that these problems can't be solved
>> Kevin> without storing a time zone string in addition to the
>> Kevin> timestamp. As I read it, Hernán was wishing for types
>> Kevin> which include this, rather than having to do the above
>> Kevin> dance with multiple values.
>>
>> Right, but including more data in a single type is the wrong
>> approach, since it complicates the semantics and interferes with
>> normalization.

Or, one could say, it encapsulates the semantics within the type's
operators, avoiding the need to repeat the logic everywhere, or to use
more verbose explicit function calls.

>> For example, if you have a type T which incorporates a timestamp
>> and a timezone, what semantics does the T = T operator have? What
>> semantics apply if the definitions of timezones change?

I'd rather sort that out once and implement the desired semantics in
the operators for a new type than to count on application programmers
doing it consistently each time. Wouldn't you?

>> What if you're storing times of events at specific places; in that
>> case you want to associate the timezone with the _place_ not the
>> event (so that if the timezone rules change, moving the place from
>> one timezone to another, you only have to change the place, not all
>> the events that refer to it).

I'm not sure I quite followed you there, but Hernán's example
specifically called for storing 'Chile/Santiago', not a UTC offset or
something as easily changed as the 'CLT' or 'CLST' time zone
designations -- so it is tied to a place rather more closely than
anything else. I think that was part of his point -- that for civil
time you care about what the clock on a typical business's wall at
that place will read on that date, regardless of what changes might
happen in time zone definitions.

> Also, if someone DOES want to use these together, isn't that what
> composite types are for?

I'm going to plead both ignorance and laziness here. My use of
composite types is limited, so I don't know, offhand, whether you can
define a set of operators for a composite type which will provide the
consistent behavior with convenient operators which Hernán seems to
want. If they allow that, then it certainly seems like the way to go,
so that the component parts of the abstraction we've been calling
civil time can be easily accessed. If not, they're not suited to what
Hernán wants (as I understand it).

For the record, this discussion has made me realize that I don't care
as much about including such information with tsz as with ts. The tsz
enhancement wouldn't change the semantics of the object at all, as far
as I can see, beyond it's default presentation when you turn it into a
string. That's worth something, but pales in comparison to the value
of the civil time concept, which would actually match the common usage
in scheduling business meetings and most other every-day activities.

I think the popularity of physical time is that it is so concrete.
The reality of usage of date and time, though, is that various
abstractions which aren't tightly coupled to physical time are common
and useful. The civil time issues are one aspect of that. (And as
far as I'm concerned, leap seconds can be totally ignored for civil
time -- there's a nice round clock up on my wall with a big hand and
a little hand and a second hand all spinning around, and there's no
place on that clock face for a 61st or 62nd second in any minute,
ever.) And those who don't think it's useful be able to add one month
to the 31st of January and get a date as a result to which you can add
one month and get the 31st of March -- well, come the cultural
revolution I plan to see to it that they do nothing but write
financial applications for five years.... :-)

-Kevin


From: hgonzalez(at)gmail(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 19:21:47
Message-ID: 000e0cd70ece3a08d90478be4580@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 19, 2009 1:18am, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:

> Right, but including more data in a single type is the wrong approach,
> since it complicates the semantics and interferes with normalization.
> For example, if you have a type T which incorporates a timestamp and a
> timezone, what semantics does the T = T operator have? What semantics
> apply if the definitions of timezones change?

I dont get the thing about normalization, there's complete ortoghonality in
my approach.
And when you say "complicates the semantic" I'd say "enrich the semantics"
(and even
clarify it) so that it fits more neatly to the typical usage of dates and
times in real life.
For "datetimes with tz" the equality (and comparison) operator is not
trivial, roughly
in the same sense that date-time arithmetic is not trivial when one stops
thinking of
datetimes as "physical time". So is life. Should the datetimes
'2010-07-27 9:30 Chile' and '2010-07-27 10:30 Argentine' (GMT+4 and GMT+3
respec)
be considered equal? It's arguable; but the ambiguity (just a matter of
adoption) reflects
reality. We can discuss it and adopt some consistent criteria.

> What if you're storing
> times of events at specific places; in that case you want to associate
> the timezone with the _place_ not the event (so that if the timezone
> rules change, moving the place from one timezone to another, you only
> have to change the place, not all the events that refer to it).

I'm not sure I undestand you here. I'm claiming that timezone rules
alterations
(zic files changes) should always be supported by the db implementation,
without
needing of touching your data. And I believe that timestamps (ie physical
times)
are in practice almost never associated to timezone information. If you
want to store
"the instant of last solar eclipse" you normally store the timestamp, a
timezone
might only be useful for displaying (or as an adittional info, not really
associated to the event)
A border case would be "store the instant of the death of John Lennon". You
might
store the TZ here if you are interested in the civil time (so you can
answer, for example,
¿how many rock stars died in morning/afternoon?). But then, again, you are
here actually
storing a civil date (local date-time plus TZ). The only problematic case i
can envision
is to intend to store a physical time in the future with TZ, but frankly it
is difficult to
think of this scenario (and even more difficult to think of needing to
operate with that
data as a whole; hence, in this case, to store the two fields separatadely
makes sense).

I'm being dense, and this might be a lost cause, but anyway, perhaps some
day in the future
this might be of some use:

I strongly believe that, if one could sample the real needings and usage of
date-time types in
applications in this world, and taking apart types DATE (very frequent, but
rather straightforward),
and TIME (not so relevant) and intervals (other issues here, much related
to datetimes), the
overwhelming majority would fall ( conceptually) into these three types:

- TIMESTAMP (physical time - no TZ - no civil time implied)
- LOCAL DATETIME (civil time, no TZ)
- DATETIME (civil time with TZ => togheter with zic tables, implies a
physical time)

And of these three -I'd bet- the first is (conceptually) the most common,
by a wide margin.

As the name TIMESTAMP implies, it frequently records the moment of a event
(in the DB corresponds
frequently to the creation or alteration of a record, frequently via
a "now()" default or such).
Examples: the timestamp of messages in a mailing list, or issues in a
bugtracker, or posts/articles in a blog/Cms.
Sometimes it is modifiable by the user. Sometimes it is displayed (as a
civil date, of course) according
to some TZ implied somewhere else. It's normal that users with differnt TZ
sees this event each
with its own TZ; and one is not directly interested on obtaining (say)
an "inherent" civil datetime for the
event (for example one is not interested in asking what posts where
generated at midnight
acording to the localtime of the user that created it).

The LOCAL DATETIME is only of use for civil date-times, when one is not
directly interested in
asociate events with real (physicial time) - this cannot be compared with a
real time (it cant trigger alarms, eg)
Or, more rarely, when the TZ is implied somehere else (in the application,
not it the DB server!).

The DATETIME is equivalent to the compound type {LOCAL_DATETIME,TZ}. Here
the "civil date-time" is again
the primary concept one deals with, but in a given place in the world (TZ),
so it implies also (with the assistance of a zic table)
a real time. This type is, IMHO, less frequent than the others. The typical
use is for calendars or schedulers.

One could, a propos Andrew's observation, consider a fourth type: TIMESTAMP
WITH TZ. But it seems overkill:
except for ZIC changes, the correspondence with DATETIME is univocal (BTW,
this is why in the Jodatime API
-which does not deal with persistence- this concepts are strictly
equivalent). Given this nearly-equivalence, and that
the needing of this type in real life is (IMO) almost null, I think that
DATETIME is the one to survive.

(One could even propose a fifht type: a TIMESTAMP WITH GMT OFFSET (roughly
the ANSI proposal), which
would be equivalent to have a timestamp AND a local datetime; this is more
easy to deal with than timezones,
but (as was discussed here before) is too limited (does not allow
artithmetic) and is not orthogonal with the real useful types.)

Regarding implementation:

TIMESTAMP is straightforward, more or less the same as today: stored as
UTC, can be
input/output in ISO 8601 format (the client/server can use the offset they
like, internally it's translated to GMT+0)

LOCAL_DATETIME also is straightforward, also stored as UTC (as in GMT+0) .
BUT
- input/output in ISO 8601 format should not allow/produce GMT offset
- the similarity of implementation should not leak upwards. This types are
incompatible, cannot be compared, etc

DATETIME is the difficult one, of course.
- Equivalent to the pair {LOCAL_DATETIME,TX_id} (occupies more space)
- Requires some catalog table or something akin to codify consistently the
timezones as numbers (included in pg_dump output?)
- Requires new definitions for input/output (and deal with some
ambiguities, particulary in DST transitions)
- Requires some semantic definitions (orderig, equality)
- Some arithmetics (which involve convertion to physic time) may be
expensive, might require some aggresive caching of time (zic) calculations.

Global considerations:
- Backward compatibility?
- SQL spec compatibility? (is worsened?) (deprecate TIMESTAMP WITH
TIMEZONE?)
- implement conversion functions - castings (how strict?)
- discuss/implement interval types/functions
- interfaces (JDBC...)

A bit of work, granted... (I might help)... but I bet that the most
difficult work, by far, is to
reach an agreement :-)

Thanks for reading.

Hernán J. González


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "hernan gonzalez" <hgonzalez(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 22:09:19
Message-ID: 87iqd6b2vi.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Kevin" == Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:

>>> For example, if you have a type T which incorporates a timestamp
>>> and a timezone, what semantics does the T = T operator have? What
>>> semantics apply if the definitions of timezones change?

Kevin> I'd rather sort that out once and implement the desired
Kevin> semantics in the operators for a new type than to count on
Kevin> application programmers doing it consistently each time.
Kevin> Wouldn't you?

No, because the desired semantics are not the same for everyone, so
even if you take just the two examples I gave above, you're already
into combinatorial explosion with four different types needed.

By keeping it as a composite value, you allow the app to define the
semantics it needs.

>>> What if you're storing times of events at specific places; in
>>> that case you want to associate the timezone with the _place_ not
>>> the event (so that if the timezone rules change, moving the place
>>> from one timezone to another, you only have to change the place,
>>> not all the events that refer to it).

Kevin> I'm not sure I quite followed you there, but Hernán's example
Kevin> specifically called for storing 'Chile/Santiago', not a UTC
Kevin> offset or something as easily changed as the 'CLT' or 'CLST'
Kevin> time zone designations -- so it is tied to a place rather more
Kevin> closely than anything else.

But those place definitions do occasionally change. For example, some
US states can change timezone at county level; suppose a state that
was previously all one timezone decides to change timezone or DST
observance for all except a few counties that remain on the previous
setting. So places within those counties will have to change timezone
name from America/Somestate to America/Somestate/Oddcounty while
places in the rest of the state stay with America/Somestate.

The fact that geographic names are used for timezones doesn't mean
that the timezone name applicable to a given place doesn't change;
timezones in the database can split when rule changes happen that
don't affect the full extent of the previous zone; this leads to two
or more zones which have identical definitions up to some date, and
different definitions after it. (Zones can only split, they can't
merge, due to the necessity of keeping historical changes.)

Kevin> I think that was part of his point -- that for civil time you
Kevin> care about what the clock on a typical business's wall at that
Kevin> place will read on that date, regardless of what changes might
Kevin> happen in time zone definitions.

Right, but if timezone _boundaries_ change, this can't happen without
some manual corrections. (If the timezone _rules_ change without
changing the boundaries, then just updating the tzdata is enough if
you designed the db correctly.)

--
Andrew.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: hernan gonzalez <hgonzalez(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-29 01:40:43
Message-ID: 200911290140.nAT1ehK07981@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hernan gonzalez wrote:
> > hernan> The support of timezones is really crippled
> > ?hernan> now.
> >
> > Crippled how?
>
> Well, among other things, no builtin date-timetype allows me to save
> the timezone (or even the offset).
> No type allows to treat this three datetimes as different values.
> '2010-07-27 10:30 GMT+4' '2010-07-27 09:30 GMT+5' '2010-07-27 10:30 GMT+0'
> The ANSI spec at least permits that.

I think there is general agreement that we should have a timezone data
type which validates against pg_timezone_names().name. It might be
enough to just document how users can create such a domain data type,
but I don't know of a way to do that. Is this a TODO?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: hernan gonzalez <hgonzalez(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-29 09:31:41
Message-ID: 87ws19y90h.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Bruce" == Bruce Momjian <bruce(at)momjian(dot)us> writes:

Bruce> I think there is general agreement that we should have a
Bruce> timezone data type which validates against
Bruce> pg_timezone_names().name.

What happens when pg_timezone_names output changes? (which it can do,
especially if the install is using the OS tzdata; even if not using OS
tzdata, it's not expected to be stable even between point releases)

--
Andrew.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: hernan gonzalez <hgonzalez(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-29 13:09:03
Message-ID: 200911291309.nATD93G10824@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth wrote:
> >>>>> "Bruce" == Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
> Bruce> I think there is general agreement that we should have a
> Bruce> timezone data type which validates against
> Bruce> pg_timezone_names().name.
>
> What happens when pg_timezone_names output changes? (which it can do,
> especially if the install is using the OS tzdata; even if not using OS
> tzdata, it's not expected to be stable even between point releases)

Uh, wow, yea, that would invalidate stored data --- yuck.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: hernan gonzalez <hgonzalez(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-29 18:49:48
Message-ID: 0CCF4300-ED9D-40CB-AE59-36ACDB5CA619@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 28, 2009, at 5:40 PM, Bruce Momjian wrote:

> I think there is general agreement that we should have a timezone data
> type which validates against pg_timezone_names().name. It might be
> enough to just document how users can create such a domain data type,
> but I don't know of a way to do that. Is this a TODO?

From http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
BEGIN
PERFORM now() AT TIME ZONE tz;
RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
RETURN FALSE;
END;
$$ language plpgsql STABLE;

CREATE DOMAIN timezone AS CITEXT
CHECK ( is_timezone( value ) );

It could also be TEXT I suppose, but "America/Los_Angeles" and "america/los_angeles" should be considered the same.

Best,

David


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, hernan gonzalez <hgonzalez(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-29 18:58:35
Message-ID: 162867790911291058u4e3dbf87i7e84d940bba82284@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/29 David E. Wheeler <david(at)kineticode(dot)com>:
> On Nov 28, 2009, at 5:40 PM, Bruce Momjian wrote:
>
>> I think there is general agreement that we should have a timezone data
>> type which validates against pg_timezone_names().name.  It might be
>> enough to just document how users can create such a domain data type,
>> but I don't know of a way to do that.  Is this a TODO?
>
> From http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html
>
> CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
> BEGIN
>  PERFORM now() AT TIME ZONE tz;
>  RETURN TRUE;
> EXCEPTION WHEN invalid_parameter_value THEN
>  RETURN FALSE;
> END;
> $$ language plpgsql STABLE;
>
> CREATE DOMAIN timezone AS CITEXT
> CHECK ( is_timezone( value ) );
>
> It could also be TEXT I suppose, but "America/Los_Angeles" and "america/los_angeles" should be considered the same.

nice :)

Pavel

>
> Best,
>
> David
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>