Re: timezones to own config file

Lists: pgsql-hackers
From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: timezones to own config file
Date: 2006-06-13 12:20:09
Message-ID: 20060613122008.GA2965@mcknight.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I looked into the timezone specifications and basically extracted a list of
existing offsets from the zic database.

My proposed format for the timezone files is something like this:

HADT -32400 D # Hawaii-Aleutain Daylight Time
# (America/Adak)
HAST -36000 # Hawaii-Aleutain Standard Time
# (America/Adak)

That is, the abbreviation, the offset in seconds, optionally a D to mark
daylight saving times (goes into tm->is_dst), the name of the timezone and
the full zic names that use this timezone.

I also made the extracting script find all conflicts and commented them
manually as shown here. Most of the conflicts are between America and Asia.

# CONFLICT! ADT is not unique
# Other timezones:
# - ADT: Arabic Daylight Time (Asia)
ADT -10800 D # Atlantic Daylight Time
# (America/Glace_Bay)
# (America/Goose_Bay)
# (America/Halifax)
# (America/Thule)
# (Atlantic/Bermuda)

However, even within all "America/..." names, there are conflicts. For
example CST is used as US Central Time and as Cuba Central Standard Time.
While US Central time is UTC-6h, Cuba Central Standard Time is UTC-5h.

Another problem is that lots of the timezone names that are hardcoded into
the backend seem to be way outdated or just doubtable, many of them do not
show up in the zic database.

For example NT (Nome Time) seemed to have existed until 1967, America/Nome
is listed in the zic database at AKDT/AKST which is Alaska Daylight/Standard
Time. Other examples:

JAYT, Jayapura Time: Asia/Jayapura is listed as EIT (East Indonesia Time) in
the zic database.

JAVT, Java Time (07:00? see JT): zic database says that it is outdated and
was used until 1932.
JT, Java Time (07:30? see JAVT): I did not find a proof that this is really
+7.5 hours, some sources say it's just 7 hours.

HMT is the strangest of the bunch, I have found the name "Heard and
Mc.Donald Time" but with a different offset. I could not find a reference to
some "Hellas"-Time as indicated in the comment.

So could we remove some of those on the grounds that they do not seem to
be used any more (please correct me here if someone knows more) and that
you can easily add offsets for those if you need them?

With the same argument we could even remove timezones like BDST (British
Double Summer Time), DNT (Dansk Normal Tid), FST (French Summer Time), NOR
(Norway Standard Time), SWT (Swedish Winter Time). Could anybody from those
countries comment on whether or not those are still used or just outdated? I
figure that most of those countries have moved since long to the more common
timezone names...

Ok, after all this has been sorted out I propose to make different files for
the different continents and let the user specify with a guc which ones he
wants to use.

I could think of three possible ways:

1) (See Toms idea in
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01048.php )

Conflicts within one set can just be commented - we would try to include
whatever will probably be used by the majority of users and comment the
other one(s). Conflicts between two sets would show up when postmaster gets
started, it would complain about different definitions for the same
timezone. An American who wants to use some Asian timezones would have to
work through both files and comment conflicting timezones on one side or the
other to make postmaster start up without errors.

2) Find out which timezones do not conflict, put them in a set and load this
by default. Create other sets that are conflicting but that have some
"override" capability with regard to previous timezone definitions. Decide
on the default value for the guc (could point to American timezones for
example). An Australian could either select only the Australian file or
could specify "America, Australia" and the Australian set overrides the
American timezones in case of conflicts. This way, most people do not have
to make changes and those who have to can specify their "override"-file and
keep all the rest, including non-conflicting timezones from a conflicting
timezone set.

3) Combine both, let the user specify the guc variable as "A, B, C" and look
into C first, then in B and then in A.... *thinking* Right now I actually
think that the "overriding" idea is not that intuitive, most people would
probably expect that this is a list of priorities, so A overrides B which
overrides C.

What do you think?

Having a larger token table in datetime.c does not seem to affect
performance all that much. I did parsing tests with 2 million timestamps
equally distributed over all timezone abbreviations that I had loaded
previously and the difference of 154 timezones in comparsion to other runs
with just 35 was at about ~120ms (on my quite slow laptop computer).

The timezone definition files should be read at server start but should they
also be read at SIGHUP? If so, should they be read only by the postmaster or
by all backends?

Joachim


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 12:52:27
Message-ID: 20060613125227.GD19212@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 13, 2006 at 02:20:09PM +0200, Joachim Wieland wrote:
> I looked into the timezone specifications and basically extracted a list of
> existing offsets from the zic database.
>
> My proposed format for the timezone files is something like this:

<sip>

Any particular reason this can't be a normal table in pg_catalog which
you can select/update.

> Another problem is that lots of the timezone names that are hardcoded into
> the backend seem to be way outdated or just doubtable, many of them do not
> show up in the zic database.

<snip lots of dodgy timezones>

I've been trying to convince people for a while now that the
appropriate tz string for australia is AEST/ACST/AWST but no-one seems
convinced yet. Hence, I never actually specify timezones and all my
timestamps are inserted as GMT.

IMHO, you should simply setup the table so that it is backward
compatable and let people edit it themselves. You're never going to be
able to convince anyone that people arn't relying on it exactly the way
it is now. The most important thing is to get rid of the
"australian_timezones" hack, everything else is bonus.

> The timezone definition files should be read at server start but should they
> also be read at SIGHUP? If so, should they be read only by the postmaster or
> by all backends?

Good question...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 15:16:07
Message-ID: 23535.1150211767@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joachim Wieland <joe(at)mcknight(dot)de> writes:
> The timezone definition files should be read at server start but should they
> also be read at SIGHUP? If so, should they be read only by the postmaster or
> by all backends?

Presumably the name of the definition file to use will be a GUC
variable. I would expect the code to re-read the file any time the
variable's value is changed. In the case of a change via postgresql.conf
this would automatically happen in all backends as well as the
postmaster. You'll need to make it follow the semantics already in use
for errors in postgresql.conf, viz:
* error detected during postmaster startup -> report error and quit
* error detected during postmaster reload -> log message, ignore new setting
* error detected during backend reload -> debug message, ignore new setting

As far as the appropriate contents of the files go, I'd suggest *not*
trying to account for every abbreviation mentioned in the zic database;
lots of them are surely uninteresting, and anyone who does want Nome Time
will now be able to add it for himself. The more abbreviations you try
to understand, the less chance you have of detecting plain old errors.

IIRC, the conflicts we've actually heard about in practice are IST
(Israel vs India) and Aussie vs. USA zone names. So it might work to
have two base definition files, one for Europe/Americas (with USA and
Israel names) and one for Far East (with Aussie and Indian names).

I am not sure where Tom Lockhart got the list of timezone names that's
currently hardwired in datetime.c, but for sure you needn't treat it as
being graven on stone tablets.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 15:51:25
Message-ID: 23905.1150213885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Any particular reason this can't be a normal table in pg_catalog which
> you can select/update.

That doesn't do anything to help with one of the main problems: that
we have at least two (maybe more) alternative sets of names that people
might want as default. Getting rid of "australian_timezones" is fine,
but we can't do it by saying "all you aussies have to hack the standard
list according to your own ideas". I don't expect that very many people
will actually need to make custom timezone name lists --- if we find
they do, we'll need to work harder on the default lists. So the design
center should be "select one of a few predefined lists", not "hack away
on system catalog until you like it". Especially not if they have to do
it in template0, template1, postgres, etc. Basically, a GUC variable is
just about the right paradigm for this, a system catalog isn't.

I'd also be a bit worried about performance issues, eg, whether VACUUM
FULL on such a table would bring datetime operations to a halt.

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 21:11:26
Message-ID: 20060613211126.GJ19212@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > Any particular reason this can't be a normal table in pg_catalog which
> > you can select/update.
>
> That doesn't do anything to help with one of the main problems: that
> we have at least two (maybe more) alternative sets of names that people
> might want as default.

<snip>

I think my actual point was something else. We currently get calls from
people trying to administer machines that it's annoying that various
configuration information is stored in files, beyond the easy reach of
SQL.

What I was thinking is why we couldn't just store the information in a
global shared system table that is only read on config reload. You
could have a few columns, maybe the first being a list name, which is
referenced from a GUC.

If you issue a config reload during a VACUUM FULL, I guess that might
be an issue, yes. I was just thinking people might appreciate being
able to configure the timezones without opening a shell... Similarly,
it would also give a way for user-interfaces to get a list of available
valid timezones and their actual meanings, which is currently
impossible.

Just a thought really...
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 21:44:20
Message-ID: 20060613214420.GG34196@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote:
> On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
> > Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > > Any particular reason this can't be a normal table in pg_catalog which
> > > you can select/update.
> >
> > That doesn't do anything to help with one of the main problems: that
> > we have at least two (maybe more) alternative sets of names that people
> > might want as default.
>
> <snip>
>
> I think my actual point was something else. We currently get calls from
> people trying to administer machines that it's annoying that various
> configuration information is stored in files, beyond the easy reach of
> SQL.
>
> What I was thinking is why we couldn't just store the information in a
> global shared system table that is only read on config reload. You
> could have a few columns, maybe the first being a list name, which is
> referenced from a GUC.
>
> If you issue a config reload during a VACUUM FULL, I guess that might
> be an issue, yes. I was just thinking people might appreciate being
> able to configure the timezones without opening a shell... Similarly,
> it would also give a way for user-interfaces to get a list of available
> valid timezones and their actual meanings, which is currently
> impossible.

ISTM that's an issue that affects all configuration stuff, not just the
timezones; if we're going to come up with a way to manage settings
without touching a file, it should work for everything.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-13 22:15:48
Message-ID: 11006.1150236948@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> What I was thinking is why we couldn't just store the information in a
> global shared system table that is only read on config reload. You
> could have a few columns, maybe the first being a list name, which is
> referenced from a GUC.

Hmmm ... if we keep the notion of a GUC that identifies a set of
compatible timezone names, then a table with a primary key of
(tz_set_name, tz_name) doesn't seem quite so awful. The main
remaining objection I can see is that the postmaster couldn't use
it, only backends. Now this doesn't matter much as far as timestamp
operations go because I don't think the postmaster does any operations
that need TZ data --- but what of verifying that the GUC variable has
a valid value in postgresql.conf at startup? If you're willing to
abandon sanity checking on that string, it might work.

One interesting thought about a system table is that it could be
referenced through a syscache, which'd have the nice property that only
the (probably few) values actually referenced in a given session need to
get loaded.

regards, tom lane


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-14 21:18:53
Message-ID: 20060614211853.GA2185@mcknight.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote:
> I think my actual point was something else. We currently get calls from
> people trying to administer machines that it's annoying that various
> configuration information is stored in files, beyond the easy reach of
> SQL.

While that would be a nice feature I don't see yet how this should work in
detail.

> What I was thinking is why we couldn't just store the information in a
> global shared system table that is only read on config reload. You
> could have a few columns, maybe the first being a list name, which is
> referenced from a GUC.
> [...] Similarly, it would also give a way for user-interfaces to get a
> list of available valid timezones and their actual meanings, which is
> currently impossible.

I think you'll get lots of problems when you only read the table on config
reload. First you get inconsistencies. I update the table with new timezone
data and have to SIGHUP postmaster to read in the new table. In the meantime
users get a wrong list of available timezones.

Next, how does all that work with transactions and visibility? What if I
update and send SIGHUP, what gets applied while I have not yet committed? And
when the transaction rolls back, which version is active?

Third, what about dumping and restoring? Dumping would have to compare the
built-in set with the active settings and generate SQL commands from it,
right? Restoring would execute them and - to make that active - reload the
configuration which could have the side effects of activating other
settings.

> Just a thought really...

Here as well :-)
The config-by-sql approach seems to need some more specifications. If nobody
objects, I'll submit the patch in the form that it reads from a file, as Tom
pointed out this is a setting that gets only changed by few people and
moreover it's not part of regular administrator's work (like changes to
pg_hba.conf for example). Right now you have to get pgsql's source and
recompile, so having to edit a file and sending a SIGHUP seems to be quite an
improvement already. Getting a list of active timezones could be easily done
with a system view but you might not want to promise its existence when you
think about moving the whole thing to a system catalog later...

Joachim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: timezones to own config file
Date: 2006-06-14 21:40:47
Message-ID: 4046.1150321247@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joachim Wieland <joe(at)mcknight(dot)de> writes:
> Getting a list of active timezones could be easily done
> with a system view but you might not want to promise its existence when you
> think about moving the whole thing to a system catalog later...

A read-only view wouldn't be a bad idea, actually, for both the
long-form TZ names and the abbreviations. It'd be easy to cons one
up the same way as our other views based on functions. Doesn't even
need to be in core, could be contrib, if there's doubts about its
usefulness.

regards, tom lane