Re: generate_series() Interpretation

Lists: pgsql-hackers
From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: generate_series() Interpretation
Date: 2011-06-27 17:49:09
Message-ID: AC9381D0-1004-41CF-85E8-8F947565BA7D@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

I'm curious about behavior such as this:

bric=# select generate_series('2011-05-31'::timestamp , '2012-04-01'::timestamp, '1 month');
generate_series
---------------------
2011-05-31 00:00:00
2011-06-30 00:00:00
2011-07-30 00:00:00
2011-08-30 00:00:00
2011-09-30 00:00:00
2011-10-30 00:00:00
2011-11-30 00:00:00
2011-12-30 00:00:00
2012-01-30 00:00:00
2012-02-29 00:00:00
2012-03-29 00:00:00

It seems to me that this is subject to interpretation. If I was building a calendaring app, for example, I might rather that the results were:

generate_series
---------------------
2011-05-31 00:00:00
2011-06-30 00:00:00
2011-07-31 00:00:00
2011-08-31 00:00:00
2011-09-30 00:00:00
2011-10-31 00:00:00
2011-11-30 00:00:00
2011-12-31 00:00:00
2012-01-31 00:00:00
2012-02-29 00:00:00
2012-03-31 00:00:00

Is there some way to change the interpretation of interval calculation like this? Or would I just have to write my own function to do it the way I want?

Thanks,

David


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 17:54:01
Message-ID: 4E08C3B9.7070002@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/27/2011 10:49 AM, David E. Wheeler wrote:
> Hackers,
>
> I'm curious about behavior such as this:
>
> bric=# select generate_series('2011-05-31'::timestamp , '2012-04-01'::timestamp, '1 month');
> generate_series
> ---------------------
> 2011-05-31 00:00:00
> 2011-06-30 00:00:00
> 2011-07-30 00:00:00
> 2011-08-30 00:00:00
> 2011-09-30 00:00:00
> 2011-10-30 00:00:00
> 2011-11-30 00:00:00
> 2011-12-30 00:00:00
> 2012-01-30 00:00:00
> 2012-02-29 00:00:00
> 2012-03-29 00:00:00
>
> It seems to me that this is subject to interpretation. If I was building a calendaring app, for example, I might rather that the results were:
>
> generate_series
> ---------------------
> 2011-05-31 00:00:00
> 2011-06-30 00:00:00
> 2011-07-31 00:00:00
> 2011-08-31 00:00:00
> 2011-09-30 00:00:00
> 2011-10-31 00:00:00
> 2011-11-30 00:00:00
> 2011-12-31 00:00:00
> 2012-01-31 00:00:00
> 2012-02-29 00:00:00
> 2012-03-31 00:00:00
>
> Is there some way to change the interpretation of interval calculation like this? Or would I just have to write my own function to do it the way I want?
>
> Thanks,
>
> David
>
>

That's just how intervals that represent varying periods of time work.
You would need to write your own. But a series of end-of-month dates is
pretty easy:
select generate_series('2011-06-01'::timestamp ,
'2012-04-01'::timestamp, '1 month') - '1 day'::interval;
?column?
---------------------
2011-05-31 00:00:00
2011-06-30 00:00:00
2011-07-31 00:00:00
2011-08-31 00:00:00
2011-09-30 00:00:00
2011-10-31 00:00:00
2011-11-30 00:00:00
2011-12-31 00:00:00
2012-01-31 00:00:00
2012-02-29 00:00:00
2012-03-31 00:00:00

Cheers,
Steve


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 17:56:37
Message-ID: BA3B2597-4F1C-4720-B252-BB39AFD76FA9@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 27, 2011, at 10:54 AM, Steve Crawford wrote:

> That's just how intervals that represent varying periods of time work. You would need to write your own. But a series of end-of-month dates is pretty easy:
> select generate_series('2011-06-01'::timestamp , '2012-04-01'::timestamp, '1 month') - '1 day'::interval;

Yeah, but it's trickier if you have a calendaring app and don't know that date a user has chosen for a monthly recurring event. They might have selected June 30, in which case only February would ever need to be different than the default.

Best,

David


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 18:03:17
Message-ID: 4E087F95020000250003EC64@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> wrote:

> generate_series
> ---------------------
> 2011-05-31 00:00:00
> 2011-06-30 00:00:00
> 2011-07-31 00:00:00
> 2011-08-31 00:00:00
> 2011-09-30 00:00:00
> 2011-10-31 00:00:00
> 2011-11-30 00:00:00
> 2011-12-31 00:00:00
> 2012-01-31 00:00:00
> 2012-02-29 00:00:00
> 2012-03-31 00:00:00
>
> Is there some way to change the interpretation of interval
> calculation like this? Or would I just have to write my own
> function to do it the way I want?

It is precisely to support such fancy things that some products
support a more abstract date type which allows 31 days in any month,
and then normalizes to real dates as needed. The PostgreSQL
developer community has generally not been receptive to such use
cases. I think you need to iterate through month intervals and add
those to the starting date for now. If you want to start with the
last day of a month with less than 31 days, you may need to back up
a month or two to find a suitable month and offset your intervals by
the appropriate number of months.

I'd bet that if you encapsulate all that in a PostgreSQL function,
you're not the only one who would find it useful.

-Kevin


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 18:06:35
Message-ID: E472FE99-31AE-4DC7-BE9A-EB9883BF39AE@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 27, 2011, at 11:03 AM, Kevin Grittner wrote:

> It is precisely to support such fancy things that some products
> support a more abstract date type which allows 31 days in any month,
> and then normalizes to real dates as needed. The PostgreSQL
> developer community has generally not been receptive to such use
> cases. I think you need to iterate through month intervals and add
> those to the starting date for now. If you want to start with the
> last day of a month with less than 31 days, you may need to back up
> a month or two to find a suitable month and offset your intervals by
> the appropriate number of months.
>
> I'd bet that if you encapsulate all that in a PostgreSQL function,
> you're not the only one who would find it useful.

Yeah, did that a while ago:

http://www.justatheory.com/computers/databases/postgresql/recurring_events.html

I think it could be simpler now, with generate_series() for some intervals.

Best,

David


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 18:36:14
Message-ID: 4E08CD9E.3070906@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/27/2011 10:56 AM, David E. Wheeler wrote:
> On Jun 27, 2011, at 10:54 AM, Steve Crawford wrote:
>
>> That's just how intervals that represent varying periods of time work. You would need to write your own. But a series of end-of-month dates is pretty easy:
>> select generate_series('2011-06-01'::timestamp , '2012-04-01'::timestamp, '1 month') - '1 day'::interval;
> Yeah, but it's trickier if you have a calendaring app and don't know that date a user has chosen for a monthly recurring event. They might have selected June 30, in which case only February would ever need to be different than the default.
>
> Best,
>
> David
>
>
>
The query is marginally trickier. But the better calendaring apps give a
variety of options when selecting "repeat": A user who selects June 30,
2011 and wants a monthly repeat might want:

30th of every month - skip months without a 30th
30th of every month - move to end-of-month if 30th doesn't exist
Last day of every month
Last Thursday of every month

Typical payday repeats are "the 15th and last -day-of-month if a workday
or the closest preceding workday if not", "second and last Friday",
"every other Friday"...

No matter how '1 month' is interpreted in generate_series, the
application programmer will still need to write the queries required to
handle whatever calendar-repeat features are deemed necessary.

Cheers,
Steve


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 18:38:32
Message-ID: AA76EC8B-7880-466F-A614-8A6DF4060E29@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 27, 2011, at 11:36 AM, Steve Crawford wrote:

> The query is marginally trickier. But the better calendaring apps give a variety of options when selecting "repeat": A user who selects June 30, 2011 and wants a monthly repeat might want:
>
> 30th of every month - skip months without a 30th
> 30th of every month - move to end-of-month if 30th doesn't exist
> Last day of every month
> Last Thursday of every month
>
> Typical payday repeats are "the 15th and last -day-of-month if a workday or the closest preceding workday if not", "second and last Friday", "every other Friday"...
>
> No matter how '1 month' is interpreted in generate_series, the application programmer will still need to write the queries required to handle whatever calendar-repeat features are deemed necessary.

Yeah, which is why I said it was subject to interpretation. Of course there's no way to tell generate_series() which to use, which is what I figured.

Thanks,

David


From: Michael Nolan <htfoot(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Steve Crawford <scrawford(at)pinpointresearch(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 19:12:48
Message-ID: BANLkTi=y6jPhB506r2YvKwb7UBYgnCuw8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 27, 2011 at 1:38 PM, David E. Wheeler <david(at)kineticode(dot)com>wrote:

>
> Yeah, which is why I said it was subject to interpretation. Of course
> there's no way to tell generate_series() which to use, which is what I
> figured.
>

generate_series() is doing exactly what it was designed to do, the
imprecision regarding adding '1 month' to something that may or may not have
been intended to be 'last day of the month' is a limitation in the interval
code.

One way to change this would be to implement another interval type such as
'full_month' which would take a date that is know to be the last day of the
month and make it the last day of the appropriate month. If the starting
date is NOT the last day of a month, the existing logic would suffice.

Or you can do as I have done and create your own last_day() function that
takes any date and makes it the last day of that month, and apply it to the
output of generate_series();
--
Mike Nolan
nolan(at)tssi(dot)com


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 19:20:49
Message-ID: 4E08D811.7060201@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Yeah, which is why I said it was subject to interpretation. Of course there's no way to tell generate_series() which to use, which is what I figured.
>
Fortunately PostgreSQL uses the same interpretation for '1 month' when
used in generate_series that it does everywhere else - to do otherwise
would be hella confusing. :)

Cheers,
Steve


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 19:31:36
Message-ID: BANLkTim4Vm4aEN5kSn_Br1v7qHSsaeOWHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 27, 2011 at 1:49 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> Hackers,
>
> I'm curious about behavior such as this:
>
> bric=# select generate_series('2011-05-31'::timestamp , '2012-04-01'::timestamp, '1 month');
>   generate_series
> ---------------------
>  2011-05-31 00:00:00
>  2011-06-30 00:00:00
>  2011-07-30 00:00:00
>  2011-08-30 00:00:00
>  2011-09-30 00:00:00
>  2011-10-30 00:00:00
>  2011-11-30 00:00:00
>  2011-12-30 00:00:00
>  2012-01-30 00:00:00
>  2012-02-29 00:00:00
>  2012-03-29 00:00:00
>
> It seems to me that this is subject to interpretation. If I was building a calendaring app, for example, I might rather that the results were:
>
>   generate_series
> ---------------------
>  2011-05-31 00:00:00
>  2011-06-30 00:00:00
>  2011-07-31 00:00:00
>  2011-08-31 00:00:00
>  2011-09-30 00:00:00
>  2011-10-31 00:00:00
>  2011-11-30 00:00:00
>  2011-12-31 00:00:00
>  2012-01-31 00:00:00
>  2012-02-29 00:00:00
>  2012-03-31 00:00:00
>
> Is there some way to change the interpretation of interval calculation like this? Or would I just have to write my own function to do it the way I want?

It's not hugely difficult to get something pretty appropriate:

emp(at)localhost-> select generate_series('2011-06-01'::timestamp ,
'2012-04-01'::timestamp, '1 month')- '1 day' ::interval;
?column?
---------------------
2011-05-31 00:00:00
2011-06-30 00:00:00
2011-07-31 00:00:00
2011-08-31 00:00:00
2011-09-30 00:00:00
2011-10-31 00:00:00
2011-11-30 00:00:00
2011-12-31 00:00:00
2012-01-31 00:00:00
2012-02-29 00:00:00
2012-03-31 00:00:00
(11 rows)

That's more or less a bit of "cleverness." But it's not so grossly
clever as to seem too terribly frightful.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 19:36:36
Message-ID: BANLkTi=2_hTDcm0-qK-KCNW=2taLjc-bWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 27, 2011 at 2:36 PM, Steve Crawford
<scrawford(at)pinpointresearch(dot)com> wrote:
> On 06/27/2011 10:56 AM, David E. Wheeler wrote:
>>
>> On Jun 27, 2011, at 10:54 AM, Steve Crawford wrote:
>>
>>> That's just how intervals that represent varying periods of time work.
>>> You would need to write your own. But a series of end-of-month dates is
>>> pretty easy:
>>> select generate_series('2011-06-01'::timestamp , '2012-04-01'::timestamp,
>>> '1 month') - '1 day'::interval;
>>
>> Yeah, but it's trickier if you have a calendaring app and don't know that
>> date a user has chosen for a monthly recurring event. They might have
>> selected June 30, in which case only February would ever need to be
>> different than the default.
>>
>> Best,
>>
>> David
>>
>>
>>
> The query is marginally trickier. But the better calendaring apps give a
> variety of options when selecting "repeat": A user who selects June 30, 2011
> and wants a monthly repeat might want:
>
> 30th of every month - skip months without a 30th
> 30th of every month - move to end-of-month if 30th doesn't exist
> Last day of every month
> Last Thursday of every month
>
> Typical payday repeats are "the 15th and last -day-of-month if a workday or
> the closest preceding workday if not", "second and last Friday", "every
> other Friday"...
>
> No matter how '1 month' is interpreted in generate_series, the application
> programmer will still need to write the queries required to handle whatever
> calendar-repeat features are deemed necessary.

If you look up David Skoll's "remind" application
<http://www.roaringpenguin.com/products/remind>, you'll find something
that does this kind of pattern matching quite, quite well, at a rather
sophisticated level.

I find that I don't want to go through the struggle of figuring out
how to correctly describe those recurrences.

The other way of doing this sort of thing, which seems to be generally
more intuitive, is to treat these calendars as sets, which are a
structure that SQL is rather will designed to manipulate, and use
inclusions/exclusions/intersections to determine which days are of
interest.

I wrote something on this on pgsql-general about 5 years ago that
still seems pretty relevant.

http://archives.postgresql.org/pgsql-general/2006-02/msg00159.php
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Steve Crawford <scrawford(at)pinpointresearch(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generate_series() Interpretation
Date: 2011-06-27 20:19:05
Message-ID: 208A6FF0-6462-4624-86DD-E4D794D5BD33@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 27, 2011, at 12:36 PM, Christopher Browne wrote:

> I wrote something on this on pgsql-general about 5 years ago that
> still seems pretty relevant.
>
> http://archives.postgresql.org/pgsql-general/2006-02/msg00159.php

iwantsandy.com (now defunct) originally had a solution like this. However it supported a slew of recurrences:

* hours
* 2xday
* days
* weeks
* months
* quarters
* years
* decades

We had materializations of all of these going out 5 years or so. It took up an incredible amount of database space and was really slow. I replaced it with a variation on the code described in this blog post:

http://www.justatheory.com/computers/databases/postgresql/recurring_events.html

The database was a fraction of the original size and, because views were usually limited to a month at most, the number of rows generated for a query to show recurring events was quite limited (no one had an hourly reminder that when for more than a couple of days). Queries were a lot faster, too.

So I think the materialization of dates can work in certain limited cases such as your "vacations 2005" example, and will be easier to use thanks to JOINs, I found that it performed poorly and was unnecessarily resource-intensive for our usage. And I suspect the same would be try for anyone building a calendar app with more than one simple kind of limited recurrence.

Best,

David