Re: Procedure for feature requests?

Lists: pgsql-general
From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Procedure for feature requests?
Date: 2009-10-01 23:35:25
Message-ID: m31vlmn0w2.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

suppose I thought that PostgreSQL would benefit greatly from
a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function
- where do I suggest such a thing? Here on -general? On
-hackers? Directly edit
<URI:http://wiki.postgresql.org/wiki/Todo>?

Suppose the feature request was not a trivial one, but
maybe a "DEPENDS ON <object>" clause for "CREATE FUNCTION"
to allow PostgreSQL to deny requests to drop a table/view/
function that is needed by a function - where would I pro-
pose that?

TIA,
Tim


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-01 23:44:30
Message-ID: 20091001234430.GN5607@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tim Landscheidt wrote:
> Hi,
>
> suppose I thought that PostgreSQL would benefit greatly from
> a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function
> - where do I suggest such a thing? Here on -general? On
> -hackers? Directly edit
> <URI:http://wiki.postgresql.org/wiki/Todo>?

I think direct edition of Todo is discouraged, particularly for
outsiders. Suppose we decided that we didn't want to implement your
suggestion for whatever reason? We get lots of people coming out of the
blue with a patch to implement something they found on Todo, only to
figure out that we didn't want the idea implemented in the first place.
Surely we don't want to turn people away from development just because
Todo is not well managed.

> Suppose the feature request was not a trivial one, but
> maybe a "DEPENDS ON <object>" clause for "CREATE FUNCTION"
> to allow PostgreSQL to deny requests to drop a table/view/
> function that is needed by a function - where would I pro-
> pose that?

On -hackers, just like any other feature request, trivial or not.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-02 14:10:17
Message-ID: 20091002141017.GA5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Oct 01, 2009 at 11:35:25PM +0000, Tim Landscheidt wrote:
> suppose I thought that PostgreSQL would benefit greatly from
> a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function

8.4 has a generate_series(timestamp,timestamp,interval) which would seem
to be a bit more flexible than you want.

--
Sam http://samason.me.uk/


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-02 21:37:20
Message-ID: m31vlla35b.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> [...]
>> Suppose the feature request was not a trivial one, but
>> maybe a "DEPENDS ON <object>" clause for "CREATE FUNCTION"
>> to allow PostgreSQL to deny requests to drop a table/view/
>> function that is needed by a function - where would I pro-
>> pose that?

> On -hackers, just like any other feature request, trivial or not.

Thanks. Any particular form? A quick glance at the archives
did not reveal any feature requests that were not accompa-
nied by a patch :-).

Tim


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-02 21:48:14
Message-ID: m3ws3d8o2p.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

>> suppose I thought that PostgreSQL would benefit greatly from
>> a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function

> 8.4 has a generate_series(timestamp,timestamp,interval) which would seem
> to be a bit more flexible than you want.

Yes, I know :-). But as "generate_series(A, B, C)" can also
be written as "A + generate_series(0, (C - B) / C) * C" (or
something "flexible" like that :-)), a
"generate_series(DATE, DATE)" would inter alia get rid off
the need to cast the result from TIMESTAMP to DATE and to
explicitly specify "'1 day'". Just a small, trivial enhance-
ment for a popular use case :-).

Tim


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-02 21:48:42
Message-ID: 1254520122.27877.0.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 2009-10-02 at 21:37 +0000, Tim Landscheidt wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
> > [...]
> >> Suppose the feature request was not a trivial one, but
> >> maybe a "DEPENDS ON <object>" clause for "CREATE FUNCTION"
> >> to allow PostgreSQL to deny requests to drop a table/view/
> >> function that is needed by a function - where would I pro-
> >> pose that?
>
> > On -hackers, just like any other feature request, trivial or not.
>
> Thanks. Any particular form? A quick glance at the archives
> did not reveal any feature requests that were not accompa-
> nied by a patch :-).
>

There is not a specific format but a good thing to do is:

Request Feature
Provide Use Case
Provide Example syntax (if applicable)

Joshua D. Drake

> Tim
>
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 15:23:36
Message-ID: 20091003152336.GB5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 02, 2009 at 09:48:14PM +0000, Tim Landscheidt wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > 8.4 has a generate_series(timestamp,timestamp,interval) which would seem
> > to be a bit more flexible than you want.
>
> Yes, I know :-). But as "generate_series(A, B, C)" can also
> be written as "A + generate_series(0, (C - B) / C) * C" (or
> something "flexible" like that :-)), a

For things as complicated as timestamps I'm not sure if this is such a
trivial transform. If you can figure out the limit then it seems easy,
though I'm not sure how you'd do that.

> "generate_series(DATE, DATE)" would inter alia get rid off
> the need to cast the result from TIMESTAMP to DATE and to
> explicitly specify "'1 day'". Just a small, trivial enhance-
> ment for a popular use case :-).

Interesting, I tend to aim for maximum expressiveness not ease of
expressiveness. It would be somewhat easy to add the above if you want
though:

CREATE FUNCTION generate_series(date,date)
RETURNS SETOF date
IMMUTABLE LANGUAGE sql AS $$
SELECT generate_series($1::timestamp,$2::timestamp,interval '1 day')::date;
$$;

or I suppose you could use the integer series generation:

SELECT $1 + generate_series(0,$2 - $1);

Hum, now I'll have to see which is "better".

That second version seems to be slightly quicker (20 to 30%, for ranges
from a year up to a century respectively) so you may prefer it, but the
difference is going to be in the noise for any query I've ever used
generate_series for.

--
Sam http://samason.me.uk/


From: Martin Gainty <mgainty(at)hotmail(dot)com>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 16:14:19
Message-ID: BLU142-W108163601A9B7C24761566AED10@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


is there a way to create a cast with assignment e.g.?
CREATE CAST ((date,date) AS int4) WITH FUNCTION generate_series(date,date) AS ASSIGNMENT;http://www.postgresql.org/docs/8.4/static/sql-createcast.html
?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

> Date: Sat, 3 Oct 2009 16:23:36 +0100
> From: sam(at)samason(dot)me(dot)uk
> To: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Procedure for feature requests?
>
> On Fri, Oct 02, 2009 at 09:48:14PM +0000, Tim Landscheidt wrote:
> > Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > > 8.4 has a generate_series(timestamp,timestamp,interval) which would seem
> > > to be a bit more flexible than you want.
> >
> > Yes, I know :-). But as "generate_series(A, B, C)" can also
> > be written as "A + generate_series(0, (C - B) / C) * C" (or
> > something "flexible" like that :-)), a
>
> For things as complicated as timestamps I'm not sure if this is such a
> trivial transform. If you can figure out the limit then it seems easy,
> though I'm not sure how you'd do that.
>
> > "generate_series(DATE, DATE)" would inter alia get rid off
> > the need to cast the result from TIMESTAMP to DATE and to
> > explicitly specify "'1 day'". Just a small, trivial enhance-
> > ment for a popular use case :-).
>
> Interesting, I tend to aim for maximum expressiveness not ease of
> expressiveness. It would be somewhat easy to add the above if you want
> though:
>
> CREATE FUNCTION generate_series(date,date)
> RETURNS SETOF date
> IMMUTABLE LANGUAGE sql AS $$
> SELECT generate_series($1::timestamp,$2::timestamp,interval '1 day')::date;
> $$;
>
> or I suppose you could use the integer series generation:
>
> SELECT $1 + generate_series(0,$2 - $1);
>
> Hum, now I'll have to see which is "better".
>
> That second version seems to be slightly quicker (20 to 30%, for ranges
> from a year up to a century respectively) so you may prefer it, but the
> difference is going to be in the noise for any query I've ever used
> generate_series for.
>
> --
> Sam http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/


From: Corey Tisdale <corey(at)eyewantmedia(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Embarassing GROUP question
Date: 2009-10-03 16:25:14
Message-ID: 36AA710D-69ED-4C8D-8F79-C3E69DB36C0E@eyewantmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

I'm a recent MySQL convert, and I am having some difficulty with
syntax on grouping a table by a foreign key and returning only the
newest entry that matches. In MySQL, you can do something like

event_log
------------
id
event_type
event_date
meaningful_data

SELECT
meaningful_data,
event_type,
event_date
FROM
event_log
GROUP BY
event_type
ORDER BY
event_date DESC

And this would return back the most recent event and meaningful data
for each event type. When I try this in postgres, I get errors about
either grouping by id or using id in an aggregate function. I am
afraid if I use it in an aggregate function, it will disrupt the sort
order I am attempting, and if I group by it, I no longer get the most
recent data for each event type, but I get every event. How would I
accomplish this is postresql? Is there some aggregate functions that
help with this, or should I be trying a different method altogether?
Grouping feels pretty different, and I suspect closer to standards, in
postgres. Unfortunately, my mind works (for the short term) in mysql.
Will someone please give me some pointers?

Thanks!
Corey


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 16:32:39
Message-ID: 20091003163239.GE5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

[ please don't top-post it's difficult to follow for those not directly
involved in the discussion ]

On Sat, Oct 03, 2009 at 12:14:19PM -0400, Martin Gainty wrote:
> is there a way to create a cast with assignment e.g.?

Hum, I'm unsure how this would help. Maybe more explanation would help?

> CREATE CAST ((date,date) AS int4) WITH FUNCTION generate_series(date,date) AS ASSIGNMENT;

generate_series returns to a SETOF values. It also has *much* more
complicated semantics than I'd expect most people would attribute as
useful to a datatype conversion function. For example, why would
casting from a pair of dates end up as a set of rows containing a single
date value?

I have a large problem understanding the real purpose casts so maybe I'm
missing something. My problem is that I don't understand the purpose
of trying to provide a "standard" way of converting between arbitrary
datatypes, it seems much easier to just provide a standard set of domain
specific functions that are explicitly used by the user. The SQL
standard specifies that they need to exist so PG has to support them,
but their purpose still confuses me!

--
Sam http://samason.me.uk/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 16:48:57
Message-ID: 1776.1254588537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> I have a large problem understanding the real purpose casts so maybe I'm
> missing something. My problem is that I don't understand the purpose
> of trying to provide a "standard" way of converting between arbitrary
> datatypes,

I think the reason CREATE CAST exists is exactly that the cast mechanism
*isn't* intended to provide conversions between any arbitrary pair of
datatypes. It's only intended to provide conversions in those cases
where the conversion semantics are obvious to some degree or other.
Since that's somewhat in the eye of the beholder, we allow the user
to adjust edge cases by creating/removing casts --- but there's no
expectation that when you define a new datatype, you'll provide casts
to or from unrelated types.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Corey Tisdale <corey(at)eyewantmedia(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 17:05:49
Message-ID: 1998.1254589549@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Corey Tisdale <corey(at)eyewantmedia(dot)com> writes:
> SELECT
> meaningful_data,
> event_type,
> event_date
> FROM
> event_log
> GROUP BY
> event_type
> ORDER BY
> event_date DESC

Is event_type a primary key, or at least a candidate key, for this
table? (I would guess not based on the name.)

If it is, then the above is actually well-defined, because there is
only one possible input row for each group. The GROUP BY is actually
kinda pointless in that case.

If it is not, then the above is *not* well-defined --- there are
multiple possible meaningful_data and event_date values for each
event_type value, and you have absolutely no idea which ones you
will get. This is not allowed per SQL standard, and MySQL has
done you no service by failing to detect the ambiguity.

What you might be after is something like Postgres' DISTINCT ON
feature, which allows you to resolve the ambiguity by specifying
a sort order for the rows within each group (and then taking the
first row in each group). See the "weather reports" example in
our SELECT reference page.

I have never really played around with this aspect of MySQL ...
but looking at this example, and presuming that you find that
it actually does something useful, I wonder whether they interpret
the combination of GROUP BY and ambiguous-per-spec ORDER BY
in some fashion similar to DISTINCT ON.

regards, tom lane


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 17:24:42
Message-ID: 20091003172442.GF5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Oct 03, 2009 at 12:48:57PM -0400, Tom Lane wrote:
> I think the reason CREATE CAST exists is exactly that the cast mechanism
> *isn't* intended to provide conversions between any arbitrary pair of
> datatypes. It's only intended to provide conversions in those cases
> where the conversion semantics are obvious to some degree or other.

Yup, but the decision to officially bless some code as being a cast
rather than "just" a function seems very arbitrary, I think this is why
I don't understand its purpose.

> Since that's somewhat in the eye of the beholder, we allow the user
> to adjust edge cases by creating/removing casts --- but there's no
> expectation that when you define a new datatype, you'll provide casts
> to or from unrelated types.

I know there's no expectation to create any casts. I think what I'm
confused about is why anyone would ever bother creating any in the first
place. I have a feeling I may have used the functionality once, but
I can't think why or for what now. Having a function seems just as
expressive to me, which is why I think I'm missing the point.

--
Sam http://samason.me.uk/


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 17:31:26
Message-ID: 20091003173126.GG5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
> What you might be after is something like Postgres' DISTINCT ON
> feature

Yup, looks that way to me as well.

> I have never really played around with this aspect of MySQL ...

Me neither.

> but looking at this example, and presuming that you find that
> it actually does something useful, I wonder whether they interpret
> the combination of GROUP BY and ambiguous-per-spec ORDER BY
> in some fashion similar to DISTINCT ON.

Yup, does look that way doesn't it. It's still a weird pair of
semantics to conflate.

Hum, if they were assuming that you'd always have to implement GROUP BY
by doing a sort step first then I can see why they'd end up with this.
But if you want to do *anything* else (i.e. hash aggregate in PG) then
you want to keep the semantics of GROUP BY and ORDER BY separate as the
spec and indeed PG does.

--
Sam http://samason.me.uk/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 20:14:21
Message-ID: 8524.1254600861@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> On Sat, Oct 03, 2009 at 12:48:57PM -0400, Tom Lane wrote:
>> I think the reason CREATE CAST exists is exactly that the cast mechanism
>> *isn't* intended to provide conversions between any arbitrary pair of
>> datatypes. It's only intended to provide conversions in those cases
>> where the conversion semantics are obvious to some degree or other.

> Yup, but the decision to officially bless some code as being a cast
> rather than "just" a function seems very arbitrary, I think this is why
> I don't understand its purpose.

It's useful when the conversion semantics are sufficiently natural that
you want the conversion to be applied implicitly. I agree that the
explicit CAST syntax hasn't got very much to recommend it over a
function call. That part you can blame on the SQL committee ;-) ...
the historical PostQUEL syntax for this was exactly a function call,
and you can still write it that way if you choose.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org, Corey Tisdale <corey(at)eyewantmedia(dot)com>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 20:39:40
Message-ID: 8798.1254602380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
>> but looking at this example, and presuming that you find that
>> it actually does something useful, I wonder whether they interpret
>> the combination of GROUP BY and ambiguous-per-spec ORDER BY
>> in some fashion similar to DISTINCT ON.

> Yup, does look that way doesn't it. It's still a weird pair of
> semantics to conflate.

I poked around in the MySQL 5.1 manual to see if this is true.
I think it isn't --- it says very clearly here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
that you simply get an arbitrary choice among the possible values
when you reference an ambiguous column. It's possible that Corey's
query actually does give him the answers he wants, but apparently
it would be an implementation artifact that they're not promising
to maintain.

> Hum, if they were assuming that you'd always have to implement GROUP BY
> by doing a sort step first then I can see why they'd end up with this.

It's worse than that --- they actually are promising that GROUP BY
orders the results! In
http://dev.mysql.com/doc/refman/5.1/en/select.html
I find

If you use GROUP BY, output rows are sorted according to the
GROUP BY columns as if you had an ORDER BY for the same
columns. To avoid the overhead of sorting that GROUP BY
produces, add ORDER BY NULL:

SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

MySQL extends the GROUP BY clause so that you can also specify
ASC and DESC after columns named in the clause:

SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

The first of these examples implies that they allow ORDER BY to override
the default GROUP BY sorting, which would mean that the ORDER BY sort
has to happen after the GROUP BY operation, unlike the approach we take
for DISTINCT ON. So that means the ORDER BY *isn't* going to affect
which row gets chosen out of each event_type group.

What I am currently betting is that Corey's query does not really do
what he thinks it does in MySQL. It probably is selecting a random
representative row in each group and then sorting on the basis of the
event_dates in those rows.

regards, tom lane


From: Corey Tisdale <corey(at)eyewantmedia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sam Mason <sam(at)samason(dot)me(dot)uk>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 21:56:02
Message-ID: FC8F94CA-4637-49CC-AB3A-43983209B009@eyewantmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We're coming from mysql 4, and changing the sort order changes the
values of all columns as you would expect, given that you would expect
a sort statement to affect grouping. This certainly isn't the only
time I've used this syntax. I've been mysql user for ten years, and
the outcome has been consistant across hundreds of tables and millions
of rows and thousands of queries. If you ever have to use or modify a
mysql db, just keep this in mind in case it saves you some time.

That being said, we've discovered a few instances where docs were
wrong, found numerous bugs with bitshifting and blob objects and cache
usage and io buffering. We even sarted working on our own storage
engine until we came to our senses and switched RDBMSeses.

5.1 has chased more than a few folks off, and rather than upgrade to
it, we started porting to postgres. I didn't mean for my comparison to
appearas a knock against postgres, merely to explain why I was having
such a problem with such a simple issue. Thanks again for the help.

Corey Tisdale

On Oct 3, 2009, at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
>> On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
>>> but looking at this example, and presuming that you find that
>>> it actually does something useful, I wonder whether they interpret
>>> the combination of GROUP BY and ambiguous-per-spec ORDER BY
>>> in some fashion similar to DISTINCT ON.
>
>> Yup, does look that way doesn't it. It's still a weird pair of
>> semantics to conflate.
>
> I poked around in the MySQL 5.1 manual to see if this is true.
> I think it isn't --- it says very clearly here:
> http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
> that you simply get an arbitrary choice among the possible values
> when you reference an ambiguous column. It's possible that Corey's
> query actually does give him the answers he wants, but apparently
> it would be an implementation artifact that they're not promising
> to maintain.
>
>> Hum, if they were assuming that you'd always have to implement
>> GROUP BY
>> by doing a sort step first then I can see why they'd end up with
>> this.
>
> It's worse than that --- they actually are promising that GROUP BY
> orders the results! In
> http://dev.mysql.com/doc/refman/5.1/en/select.html
> I find
>
> If you use GROUP BY, output rows are sorted according to the
> GROUP BY columns as if you had an ORDER BY for the same
> columns. To avoid the overhead of sorting that GROUP BY
> produces, add ORDER BY NULL:
>
> SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
>
> MySQL extends the GROUP BY clause so that you can also specify
> ASC and DESC after columns named in the clause:
>
> SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
>
> The first of these examples implies that they allow ORDER BY to
> override
> the default GROUP BY sorting, which would mean that the ORDER BY sort
> has to happen after the GROUP BY operation, unlike the approach we
> take
> for DISTINCT ON. So that means the ORDER BY *isn't* going to affect
> which row gets chosen out of each event_type group.
>
> What I am currently betting is that Corey's query does not really do
> what he thinks it does in MySQL. It probably is selecting a random
> representative row in each group and then sorting on the basis of the
> event_dates in those rows.
>
> regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


From: Martin Gainty <mgainty(at)hotmail(dot)com>
To: <corey(at)eyewantmedia(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 22:18:50
Message-ID: BLU142-W1630B1C5245E25DDC2CA79AED10@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Most Database Administrators dont allow jpg/png/gifs into BLOB columns simply because its Run-length
encoding and MUCH easier to store the picture's link e.g. http://www.mywebsite.com/PictureOfFido.jpg

Oracle on the other hand can store multi-gb images into blobs then again you're paying for that 'luxury'

Also keep in mind Postgres is under BSD license so you're getting what you pay for
<LegalStuff/>

PostgreSQL Database Management System

(formerly known as Postgres, then as Postgres95)

Portions Copyright (c) 1996-2009, The PostgreSQL Global Development Group

Portions Copyright (c) 1994, The Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
</LegalStuff>

what types of caching issues are you experencing?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

> From: corey(at)eyewantmedia(dot)com
> To: tgl(at)sss(dot)pgh(dot)pa(dot)us
> Subject: Re: [GENERAL] Embarassing GROUP question
> Date: Sat, 3 Oct 2009 16:56:02 -0500
> CC: sam(at)samason(dot)me(dot)uk; pgsql-general(at)postgresql(dot)org
>
> We're coming from mysql 4, and changing the sort order changes the
> values of all columns as you would expect, given that you would expect
> a sort statement to affect grouping. This certainly isn't the only
> time I've used this syntax. I've been mysql user for ten years, and
> the outcome has been consistant across hundreds of tables and millions
> of rows and thousands of queries. If you ever have to use or modify a
> mysql db, just keep this in mind in case it saves you some time.
>
> That being said, we've discovered a few instances where docs were
> wrong, found numerous bugs with bitshifting and blob objects and cache
> usage and io buffering. We even sarted working on our own storage
> engine until we came to our senses and switched RDBMSeses.
>
> 5.1 has chased more than a few folks off, and rather than upgrade to
> it, we started porting to postgres. I didn't mean for my comparison to
> appearas a knock against postgres, merely to explain why I was having
> such a problem with such a simple issue. Thanks again for the help.
>
> Corey Tisdale
>
> On Oct 3, 2009, at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> >> On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
> >>> but looking at this example, and presuming that you find that
> >>> it actually does something useful, I wonder whether they interpret
> >>> the combination of GROUP BY and ambiguous-per-spec ORDER BY
> >>> in some fashion similar to DISTINCT ON.
> >
> >> Yup, does look that way doesn't it. It's still a weird pair of
> >> semantics to conflate.
> >
> > I poked around in the MySQL 5.1 manual to see if this is true.
> > I think it isn't --- it says very clearly here:
> > http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
> > that you simply get an arbitrary choice among the possible values
> > when you reference an ambiguous column. It's possible that Corey's
> > query actually does give him the answers he wants, but apparently
> > it would be an implementation artifact that they're not promising
> > to maintain.
> >
> >> Hum, if they were assuming that you'd always have to implement
> >> GROUP BY
> >> by doing a sort step first then I can see why they'd end up with
> >> this.
> >
> > It's worse than that --- they actually are promising that GROUP BY
> > orders the results! In
> > http://dev.mysql.com/doc/refman/5.1/en/select.html
> > I find
> >
> > If you use GROUP BY, output rows are sorted according to the
> > GROUP BY columns as if you had an ORDER BY for the same
> > columns. To avoid the overhead of sorting that GROUP BY
> > produces, add ORDER BY NULL:
> >
> > SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
> >
> > MySQL extends the GROUP BY clause so that you can also specify
> > ASC and DESC after columns named in the clause:
> >
> > SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
> >
> > The first of these examples implies that they allow ORDER BY to
> > override
> > the default GROUP BY sorting, which would mean that the ORDER BY sort
> > has to happen after the GROUP BY operation, unlike the approach we
> > take
> > for DISTINCT ON. So that means the ORDER BY *isn't* going to affect
> > which row gets chosen out of each event_type group.
> >
> > What I am currently betting is that Corey's query does not really do
> > what he thinks it does in MySQL. It probably is selecting a random
> > representative row in each group and then sorting on the basis of the
> > event_dates in those rows.
> >
> > regards, tom lane
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141665/direct/01/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Corey Tisdale <corey(at)eyewantmedia(dot)com>
Cc: Sam Mason <sam(at)samason(dot)me(dot)uk>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 22:53:55
Message-ID: 25089.1254610435@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Corey Tisdale <corey(at)eyewantmedia(dot)com> writes:
> We're coming from mysql 4, and changing the sort order changes the
> values of all columns as you would expect, given that you would expect
> a sort statement to affect grouping. This certainly isn't the only
> time I've used this syntax. I've been mysql user for ten years, and
> the outcome has been consistant across hundreds of tables and millions
> of rows and thousands of queries. If you ever have to use or modify a
> mysql db, just keep this in mind in case it saves you some time.

Okay, I got sufficiently interested to drag out the nearest copy of
mysql and try it ...

mysql> create table t (f1 int, f2 int, f3 int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t values(1,11,111), (1,22,222), (1,44,444), (1,33,333);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> insert into t values(2,55,555), (2,22,222), (2,44,444), (2,33,333);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> insert into t values(3,55,555), (3,22,222), (3,44,444), (3,77,777);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> select * from t group by f1 order by f2;
+------+------+------+
| f1 | f2 | f3 |
+------+------+------+
| 1 | 11 | 111 |
| 2 | 55 | 555 |
| 3 | 55 | 555 |
+------+------+------+
3 rows in set (0.00 sec)

mysql> select * from t group by f1 order by f2 desc;
+------+------+------+
| f1 | f2 | f3 |
+------+------+------+
| 2 | 55 | 555 |
| 3 | 55 | 555 |
| 1 | 11 | 111 |
+------+------+------+
3 rows in set (0.00 sec)

Looks to me like we're arbitrarily getting the physically-first row in
each f1 group. It's certainly not looking for the minimum or maximum f2.

The above is with 5.1.37, but I find essentially the same wording in
the 3.x/4.x manual as in the 5.1 manual.

Now it's certainly possible that in particular circumstances you might
happen to get the right results --- for example, a scan that was using
an index might happen to deliver the rows in the right order. But I
don't see any evidence that mysql is reliably producing groupwise
minimums or maximums with this syntax. The long discussions in the
comments here:
http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.html
don't suggest that anyone else believes it works, either.

regards, tom lane


From: Corey Tisdale <corey(at)eyewantmedia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sam Mason <sam(at)samason(dot)me(dot)uk>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 23:12:20
Message-ID: AACC696E-4D4D-4DD2-BE34-3A6A158ED32D@eyewantmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

You may have nailed it. Everythig would have been indexed I. The order
it was grouped by, so perhaps the order in which things are indexed
and accesse is the kicker, or perhaps we've been consistantly lucky.

We also weren't adding image data to blobs, we were bit mapping
faceted data to blob and shifting to allow people to shop by artist or
color or subject matter across millions of posters. Normalized tables
just weren't cutting it, and bit shifting up to 32 bit was crazy fast.
After we rolled it out in production, we found mysql converts blobs to
32 bit unsigned ints before shifting. Postgres appears to not do this
at all, or our arbitrarily large test data did not trigger it on
postgres.

After the last few days, it is becoming apparent how much of a joke
mysql has been. Thanks again for such quick direction!

Corey Tisdale

On Oct 3, 2009, at 5:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Corey Tisdale <corey(at)eyewantmedia(dot)com> writes:
>> We're coming from mysql 4, and changing the sort order changes the
>> values of all columns as you would expect, given that you would
>> expect
>> a sort statement to affect grouping. This certainly isn't the only
>> time I've used this syntax. I've been mysql user for ten years, and
>> the outcome has been consistant across hundreds of tables and
>> millions
>> of rows and thousands of queries. If you ever have to use or modify a
>> mysql db, just keep this in mind in case it saves you some time.
>
> Okay, I got sufficiently interested to drag out the nearest copy of
> mysql and try it ...
>
> mysql> create table t (f1 int, f2 int, f3 int);
> Query OK, 0 rows affected (0.00 sec)
>
> mysql> insert into t values(1,11,111), (1,22,222), (1,44,444),
> (1,33,333);
> Query OK, 4 rows affected (0.00 sec)
> Records: 4 Duplicates: 0 Warnings: 0
>
> mysql> insert into t values(2,55,555), (2,22,222), (2,44,444),
> (2,33,333);
> Query OK, 4 rows affected (0.00 sec)
> Records: 4 Duplicates: 0 Warnings: 0
>
> mysql> insert into t values(3,55,555), (3,22,222), (3,44,444),
> (3,77,777);
> Query OK, 4 rows affected (0.00 sec)
> Records: 4 Duplicates: 0 Warnings: 0
>
> mysql> select * from t group by f1 order by f2;
> +------+------+------+
> | f1 | f2 | f3 |
> +------+------+------+
> | 1 | 11 | 111 |
> | 2 | 55 | 555 |
> | 3 | 55 | 555 |
> +------+------+------+
> 3 rows in set (0.00 sec)
>
> mysql> select * from t group by f1 order by f2 desc;
> +------+------+------+
> | f1 | f2 | f3 |
> +------+------+------+
> | 2 | 55 | 555 |
> | 3 | 55 | 555 |
> | 1 | 11 | 111 |
> +------+------+------+
> 3 rows in set (0.00 sec)
>
> Looks to me like we're arbitrarily getting the physically-first row in
> each f1 group. It's certainly not looking for the minimum or
> maximum f2.
>
> The above is with 5.1.37, but I find essentially the same wording in
> the 3.x/4.x manual as in the 5.1 manual.
>
> Now it's certainly possible that in particular circumstances you might
> happen to get the right results --- for example, a scan that was using
> an index might happen to deliver the rows in the right order. But I
> don't see any evidence that mysql is reliably producing groupwise
> minimums or maximums with this syntax. The long discussions in the
> comments here:
> http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.html
> don't suggest that anyone else believes it works, either.
>
> regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-03 23:47:52
Message-ID: 20091003234752.GH5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Oct 03, 2009 at 04:14:21PM -0400, Tom Lane wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> > the decision to officially bless some code as being a cast
> > rather than "just" a function seems very arbitrary
>
> It's useful when the conversion semantics are sufficiently natural that
> you want the conversion to be applied implicitly.

Thanks! After a big think I've ended up thinking the implicit casts
between the various numeric and date types are a good thing. They can
cause some confusion and semantic strangeness, but the increase in code
verbosity that results without them normally offsets these costs.

In higher assurance code this balance may tip back the other way, but
databases have more focus on having a sane set of defaults rather than
forcing you to make all the decisions up front.

> I agree that the
> explicit CAST syntax hasn't got very much to recommend it over a
> function call. That part you can blame on the SQL committee ;-) ...

What more would you want them to do? Casts that is, the SQL committee
do enough I think!

> the historical PostQUEL syntax for this was exactly a function call,
> and you can still write it that way if you choose.

I have a feeling I'll probably carry on doing that then. I'm not sure
if I'm ever going to write enough almost overlapping bits of code that
casts would become useful.

--
Sam http://samason.me.uk/


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Embarassing GROUP question
Date: 2009-10-04 00:13:47
Message-ID: 20091004001346.GI5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Oct 03, 2009 at 06:12:20PM -0500, Corey Tisdale wrote:
> We also weren't adding image data to blobs, we were bit mapping
> faceted data to blob and shifting to allow people to shop by artist or
> color or subject matter across millions of posters. Normalized tables
> just weren't cutting it, and bit shifting up to 32 bit was crazy fast.

Just out of interest; have you tried PG's support of fancier index
types? HStore or intarray would appear to help with what you're doing.
Not quite sure what you're actually doing so my guess could be a long
way off!

--
Sam http://samason.me.uk/


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-05 00:03:38
Message-ID: m3ljjq671h.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

>> > 8.4 has a generate_series(timestamp,timestamp,interval) which would seem
>> > to be a bit more flexible than you want.

>> Yes, I know :-). But as "generate_series(A, B, C)" can also
>> be written as "A + generate_series(0, (C - B) / C) * C" (or
>> something "flexible" like that :-)), a

> For things as complicated as timestamps I'm not sure if this is such a
> trivial transform. If you can figure out the limit then it seems easy,
> though I'm not sure how you'd do that.

What limit?

>> "generate_series(DATE, DATE)" would inter alia get rid off
>> the need to cast the result from TIMESTAMP to DATE and to
>> explicitly specify "'1 day'". Just a small, trivial enhance-
>> ment for a popular use case :-).

> Interesting, I tend to aim for maximum expressiveness not ease of
> expressiveness. It would be somewhat easy to add the above if you want
> though:

> CREATE FUNCTION generate_series(date,date)
> RETURNS SETOF date
> IMMUTABLE LANGUAGE sql AS $$
> SELECT generate_series($1::timestamp,$2::timestamp,interval '1 day')::date;
> $$;

> or I suppose you could use the integer series generation:

> SELECT $1 + generate_series(0,$2 - $1);

If I didn't know that, I would not have characterized the
feature request as "trivial".

> Hum, now I'll have to see which is "better".

> That second version seems to be slightly quicker (20 to 30%, for ranges
> from a year up to a century respectively) so you may prefer it, but the
> difference is going to be in the noise for any query I've ever used
> generate_series for.

Which of my mails made you think that I was not satisfied
with PostgreSQL's current performance?
"generate_series(DATE, DATE)" would just be syntactic sugar,
and I like sweets.

Tim


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-05 10:27:22
Message-ID: 20091005102722.GQ5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Oct 05, 2009 at 12:03:38AM +0000, Tim Landscheidt wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> >Tim Landscheidt wrote:
> >> "generate_series(A, B, C)" can also
> >> be written as "A + generate_series(0, (C - B) / C) * C"
> >
> > If you can figure out the limit then it seems easy,
> > though I'm not sure how you'd do that.
>
> What limit?

Sorry, I was calling the second parameter to generate_series the "limit".

Calculating "(C - B) / C" isn't easy for timestamps, whereas it's easy
for dates. I believe this is why there's a specific version for the
former but not the latter.

> > Hum, now I'll have to see which is "better".
>
> Which of my mails made you think that I was not satisfied
> with PostgreSQL's current performance?

Nothing, it's was my personal interest to see which was faster.

> "generate_series(DATE, DATE)" would just be syntactic sugar,
> and I like sweets.

We all do, but in software it's got to be balanced against the overhead
of maintaining support for these functions.

--
Sam http://samason.me.uk/


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-13 22:22:04
Message-ID: m3my3v53zn.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

>> >> "generate_series(A, B, C)" can also
>> >> be written as "A + generate_series(0, (C - B) / C) * C"
>> >
>> > If you can figure out the limit then it seems easy,
>> > though I'm not sure how you'd do that.

>> What limit?

> Sorry, I was calling the second parameter to generate_series the "limit".

> Calculating "(C - B) / C" isn't easy for timestamps, whereas it's easy
> for dates. I believe this is why there's a specific version for the
> former but not the latter.

(I obviously meant "(B - A) / C" :-).) Is it? I would assume
that you just have to convert A, B and C to seconds (since
epoch) and then use a normal integer division.

> [...]
>> "generate_series(DATE, DATE)" would just be syntactic sugar,
>> and I like sweets.

> We all do, but in software it's got to be balanced against the overhead
> of maintaining support for these functions.

My knowledge of PostgreSQL's codebase is nonexistent, so I
do not know how unstable it is.

Tim


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-14 10:28:23
Message-ID: 20091014102823.GQ5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 13, 2009 at 10:22:04PM +0000, Tim Landscheidt wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > Calculating "(C - B) / C" isn't easy for timestamps, whereas it's easy
> > for dates. I believe this is why there's a specific version for the
> > former but not the latter.
>
> (I obviously meant "(B - A) / C" :-).)

Huh, I hadn't even noticed that!

> I would assume
> that you just have to convert A, B and C to seconds (since
> epoch) and then use a normal integer division.

The problem is that the Gregorian calender is far too complicated. For
example, think what would happen with an interval of "months". It
doesn't help converting to seconds because the length of a month in
seconds changes depending on which year the month is in and which
month you're actually dealing with. This makes any definition of
"division" I've ever been able to think of ill defined and hence the
above calculation won't work.

--
Sam http://samason.me.uk/


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-27 17:01:00
Message-ID: m3d44893g3.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

> [...]
>> I would assume
>> that you just have to convert A, B and C to seconds (since
>> epoch) and then use a normal integer division.

> The problem is that the Gregorian calender is far too complicated. For
> example, think what would happen with an interval of "months". It
> doesn't help converting to seconds because the length of a month in
> seconds changes depending on which year the month is in and which
> month you're actually dealing with. This makes any definition of
> "division" I've ever been able to think of ill defined and hence the
> above calculation won't work.

Yep, you would probably need some safety margin and add a
"WHERE" clause. I should have thought about that earlier as
I recently stumbled (again) over why "INTERVAL / INTERVAL"
was not defined.

Tim


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-27 17:06:03
Message-ID: 20091027170603.GO5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 27, 2009 at 05:01:00PM +0000, Tim Landscheidt wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > any definition of "division" I've ever been able to think of [is]
> > ill defined
>
> Yep, you would probably need some safety margin and add a
> "WHERE" clause. I should have thought about that earlier as
> I recently stumbled (again) over why "INTERVAL / INTERVAL"
> was not defined.

Not sure what you mean by a "safety margin", but I don't think it would
help much. Hours are defined using seconds (they're *always* 3600
seconds long, but, say, a day isn't *always* 24 hours long) so I don't
see what a safety margin would do.

--
Sam http://samason.me.uk/


From: Tim Landscheidt <tim(at)tim-landscheidt(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-27 18:53:55
Message-ID: m3vdi07jng.fsf@passepartout.tim-landscheidt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

>> > any definition of "division" I've ever been able to think of [is]
>> > ill defined

>> Yep, you would probably need some safety margin and add a
>> "WHERE" clause. I should have thought about that earlier as
>> I recently stumbled (again) over why "INTERVAL / INTERVAL"
>> was not defined.

> Not sure what you mean by a "safety margin", but I don't think it would
> help much. Hours are defined using seconds (they're *always* 3600
> seconds long, but, say, a day isn't *always* 24 hours long) so I don't
> see what a safety margin would do.

A month can last 28 to 31 days and a year 365 to 366 days,
but for example:

| tim=# SELECT EXTRACT('epoch' FROM '1 month'::INTERVAL) / 60.0 / 60 / 24;
| ?column?
| ----------
| 30
| (1 Zeile)

| tim=#

You would have to adjust the result of "(EXTRACT('epoch'
FROM B) - EXTRACT('epoch' FROM A)) / EXTRACT('epoch' FROM
C)" by a factor of 31/30 (30/28? 28/30?) and then chop off
timestamps after B with a "WHERE" clause.

JFTR: Hours can of course also be 3601 (or theoretically
3599) seconds long, but not in PostgreSQL :-).

Tim


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-28 10:41:30
Message-ID: 20091028104130.GP5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 27, 2009 at 06:53:55PM +0000, Tim Landscheidt wrote:
> You would have to adjust the result of "(EXTRACT('epoch'
> FROM B) - EXTRACT('epoch' FROM A)) / EXTRACT('epoch' FROM
> C)" by a factor of 31/30 (30/28? 28/30?) and then chop off
> timestamps after B with a "WHERE" clause.

I'm not sure where you're going with this. The original idea was
generate_series for intervals wasn't it? When you start moving from
intervals to specific periods of time (i.e. 1st Jan 1970) then you've
lost the reason for working with intervals and you may as well just be
working with dates or timestamps. The purpose of intervals, as far as
I can tell, is so that you can use things like '1 month' and have it do
the "right" thing in our Gregorian calender.

> JFTR: Hours can of course also be 3601 (or theoretically
> 3599) seconds long, but not in PostgreSQL :-).

Depending on the standard you use, yes. BTW, I believe up to two leap
seconds are allowed forward in UTC. I believe there are also plans to
drop leap seconds and let time slowly drift out of alignment. I think
the idea is that when it starts to matter to people, in a thousand years
or so, we'll be an interplanetary species anyway and tying time to earth
this way is thought to be silly. It also unnecessarily complicates
things that don't really care and not be good enough for things that do
care.

--
Sam http://samason.me.uk/


From: Lew <noone(at)lwsc(dot)ehost-services(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Procedure for feature requests?
Date: 2009-10-31 13:48:50
Message-ID: hchf83$soj$1@news.albasani.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tim Landscheidt wrote:
>> You would have to adjust the result of "(EXTRACT('epoch'
>> FROM B) - EXTRACT('epoch' FROM A)) / EXTRACT('epoch' FROM
>> C)" by a factor of 31/30 (30/28? 28/30?) and then chop off
>> timestamps after B with a "WHERE" clause.

Sam Mason wrote:
> I'm not sure where you're going with this. The original idea was
> generate_series for intervals wasn't it? When you start moving from
> intervals to specific periods of time (i.e. 1st Jan 1970) then you've
> lost the reason for working with intervals and you may as well just be
> working with dates or timestamps. The purpose of intervals, as far as
> I can tell, is so that you can use things like '1 month' and have it do
> the "right" thing in our Gregorian calender.

Tim Landscheidt wrote:
>> JFTR: Hours can of course also be 3601 (or theoretically
>> 3599) seconds long, but not in PostgreSQL :-).

Sam Mason wrote:
> Depending on the standard you use, yes. BTW, I believe up to two leap
> seconds are allowed forward in UTC. I believe there are also plans to
> drop leap seconds and let time slowly drift out of alignment. I think
> the idea is that when it starts to matter to people, in a thousand years
> or so, we'll be an interplanetary species anyway and tying time to earth
> this way is thought to be silly. It also unnecessarily complicates
> things that don't really care and not be good enough for things that do
> care.

And don't forget that day length can vary by at least an hour either way from
24 depending on the date and geographic location. Here in the U.S., tomorrow
(November 1, 2009) will be 25 hours long in most, but not all, jurisdictions.

--
Lew