Re: Bug 1500

Lists: pgsql-hackers
From: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bug 1500
Date: 2005-03-25 19:54:40
Message-ID: 42446C80.9040002@sysmaster.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I have found what is causing the crash described in Bug 1500. Now I
would like to fix it, but need opinions about what is the correct behaviour.

The bug can be easily duplicated when formatting interval in to_char()
using 'Mon' or 'Month' in the format string.

select to_char(now() - '20011001'::date, 'YYYYMonDD');
(server process crash follows)

What happens:
1. The formatting function used is dch_date()
(src/backend/utils/adt/formatting.c) and it works on struct pg_tm.
2. The interval2tm() (src/backend/utils/adt/timestamp.c) is used to
convert the interval into pg_tm struct.
2a. If the Interval parameter has month != 0, then month and year are
filled in pg_tm
2b. If not -> they are set to 0 and only days, hours, minutes, seconds
are filled (this is the case when the bug appears).
3. dch_date() expects the struct pg_tm to have valid 1-based month index
and directly references the months/months_full arrays using (tm->month -
1) as index to get the short/full name of the month.
4. SIGSEGV in the server process

This could be easily by not allowing the bad array indexing, but it
raises a bigger problem: How is supposed the to_char() function to
format interval datatype? What is the correct output?

Should we:
1) Try to fill the missing data (years, months) using the days (but how
many days are in one month? hardcode 30/31? how many days in 1 year
then...) and fix the formatting function to ignore string based
formatting for intervals
2) Fail the entire statement (do not support interval formatting with
to_char())

Also the general to_char() Interval formatting seems broken anyway.
Note that the following (and similar) works now, but the result doesn't
seem to be correct:

test=> select to_char(now() - '20011001'::date, 'YYYYDD');
to_char
---------
000112
(1 row)

test=> select now() - '20011001'::date;
?column?
-------------------------------
1271 days 12:48:18.1216260046
(1 row)

So this bug actually brings the issue of interval to_char() formatting.
Opinions?

Regards,
Lyubomir Petrov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-25 20:33:44
Message-ID: 12189.1111782824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Lyubomir Petrov <lpetrov(at)sysmaster(dot)com> writes:
> I have found what is causing the crash described in Bug 1500. Now I
> would like to fix it, but need opinions about what is the correct behaviour.

Yeah, I just came to the same conclusion a little while ago:
http://archives.postgresql.org/pgsql-hackers/2005-03/msg00908.php

> Also the general to_char() Interval formatting seems broken anyway.

Karel Zak has stated repeatedly that interval_to_char is fundamentally
wrong and should be removed. I'm not sure it's quite as bad as that,
but it does seem that a different set of formatting codes is needed for
intervals as opposed to timestamps. Textual 'MON' doesn't even make any
sense for intervals really, AFAICS. I could see displaying an interval
in terms of '4 months', but 'April' makes no sense.

Does Oracle have to_char for intervals, and if so how do they define it?

Anyway, even if we think it's broken enough to remove going forward,
we need some kind of stopgap fix to prevent the coredump in existing
releases.

regards, tom lane


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-25 20:53:53
Message-ID: 200503251253.54061.scrawford@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> So this bug actually brings the issue of interval to_char()
> formatting. Opinions?

In digging around I discovered that it appears a decision was made to
remove to_char(interval) at the 8.1 release but I've been unable to
find the replacement for this functionality. This alarms me.

Given the messages I've seen regarding to_char(interval), it's clearly
a function that is used. As an example, in our telephony systems
there is a column for start_time and for end_time. Billing involves a
sum(end_time-start_time) for the appropriate project/client/period.
Naturally, that interval needs to be displayed appropriately.

The most common request I've seen (and it would be very helpful for me
as well) is the ability to fill the largest displayed time increment
with all remaining time in the interval.

In other words when the total increment is 7 days, 7 hours, 28
minutes, 12 seconds the desired output would be 10528 minutes 12
seconds. Think phone-billing, race times, mission clocks, etc.

So...

1) Is there really a plan to eliminate to_char(interval)?

2) If so, what is the replacement?

3) If there isn't a replacement and it's just scheduled for
elimination, what harm was to_char(interval) causing to require its
removal and what's the best way to lobby for its retention and
improvement?

Cheers,
Steve


From: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-25 21:22:36
Message-ID: 4244811C.80502@sysmaster.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Steve Crawford wrote:

>>So this bug actually brings the issue of interval to_char()
>>formatting. Opinions?
>>
>>
>
>In digging around I discovered that it appears a decision was made to
>remove to_char(interval) at the 8.1 release but I've been unable to
>find the replacement for this functionality. This alarms me.
>
>Given the messages I've seen regarding to_char(interval), it's clearly
>a function that is used. As an example, in our telephony systems
>there is a column for start_time and for end_time. Billing involves a
>sum(end_time-start_time) for the appropriate project/client/period.
>Naturally, that interval needs to be displayed appropriately.
>
>The most common request I've seen (and it would be very helpful for me
>as well) is the ability to fill the largest displayed time increment
>with all remaining time in the interval.
>
>In other words when the total increment is 7 days, 7 hours, 28
>minutes, 12 seconds the desired output would be 10528 minutes 12
>seconds. Think phone-billing, race times, mission clocks, etc.
>
>So...
>
>1) Is there really a plan to eliminate to_char(interval)?
>
>2) If so, what is the replacement?
>
>3) If there isn't a replacement and it's just scheduled for
>elimination, what harm was to_char(interval) causing to require its
>removal and what's the best way to lobby for its retention and
>improvement?
>
>Cheers,
>Steve
>
>.
>
>
>
Steve,

I am with you on this. The interval functionality is very useful and it
will be bad if it gets eliminated. I believe that the best course of
action is to keep the to_char(interval) but restrict the available
format specifications (the textual representation specificators like
Mon/Months).

Regards,
Lyubomir Petrov


From: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-25 21:28:08
Message-ID: 42448268.4050600@sysmaster.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Lyubomir Petrov <lpetrov(at)sysmaster(dot)com> writes:
>
>
>>I have found what is causing the crash described in Bug 1500. Now I
>>would like to fix it, but need opinions about what is the correct behaviour.
>>
>>
>
>Yeah, I just came to the same conclusion a little while ago:
>http://archives.postgresql.org/pgsql-hackers/2005-03/msg00908.php
>
>
>
>>Also the general to_char() Interval formatting seems broken anyway.
>>
>>
>
>Karel Zak has stated repeatedly that interval_to_char is fundamentally
>wrong and should be removed. I'm not sure it's quite as bad as that,
>but it does seem that a different set of formatting codes is needed for
>intervals as opposed to timestamps. Textual 'MON' doesn't even make any
>sense for intervals really, AFAICS. I could see displaying an interval
>in terms of '4 months', but 'April' makes no sense.
>
>Does Oracle have to_char for intervals, and if so how do they define it?
>
>Anyway, even if we think it's broken enough to remove going forward,
>we need some kind of stopgap fix to prevent the coredump in existing
>releases.
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>.
>
>
>
Tom,

Well, I can see how the to_char functionality can be very useful for
intervals - one can get the interval in days only, months and days, etc.
But I think that the format specifications that convert to strings
should be disallowed for intervals (Mon, Month, etc...).

If we decide just to ignore the non-supported format code we can
1) make dch_date aware that it is called for interval and limit the
choices (ignore the attempt to show textual name representation for
example)
2) just ignore the attempt to show month name on invalid value in struct
pg_tm.

In the second case we'll need to change only this file several times
using something like (this is good to be there anyway because of the
array indexing):

case DCH_Mon:
+ if (tm->tm_mon > 0) {
+ strcpy(inout, months[tm->tm_mon - 1]);
+ return 2;
+ }
+ return -1;
+
- strcpy(inout, months[tm->tm_mon - 1]);
- return 2;

The first case will probably have more impact. I think we can go with 2)
for 8.0.2 and 1) for 8.1.

Oracle has to_char() on intervals, but generally does not allow fancy
formatting (limited format specifications only - FF, TZD, TZH, TZM, and
TZR - which are not very useful anyway).

Regards,
Lyubomir Petrov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-26 01:03:09
Message-ID: 20404.1111798989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Steve Crawford <scrawford(at)pinpointresearch(dot)com> writes:
> In digging around I discovered that it appears a decision was made to
> remove to_char(interval) at the 8.1 release but I've been unable to
> find the replacement for this functionality. This alarms me.

Yeah. Karel Zak, who wrote that code, is convinced we should remove it,
but I don't think anyone else is ...

regards, tom lane


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, List pgsql-hackers <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Bug 1500
Date: 2005-03-26 01:18:40
Message-ID: 1111799920.2388.61.camel@petra
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-03-25 at 15:33 -0500, Tom Lane wrote:
> Lyubomir Petrov <lpetrov(at)sysmaster(dot)com> writes:
> > I have found what is causing the crash described in Bug 1500. Now I
> > would like to fix it, but need opinions about what is the correct behaviour.
>
> Yeah, I just came to the same conclusion a little while ago:
> http://archives.postgresql.org/pgsql-hackers/2005-03/msg00908.php
>
> > Also the general to_char() Interval formatting seems broken anyway.
>
> Karel Zak has stated repeatedly that interval_to_char is fundamentally
> wrong and should be removed. I'm not sure it's quite as bad as that,
> but it does seem that a different set of formatting codes is needed for
> intervals as opposed to timestamps.

Exactly. We had many discussions about it. Well, short summary:

the current to_char(interval) is:

interval -> struct tm -> string

and it's definitely bad. You can't formatting interval as date/time
string and you can't use calendar practices in particular case.

The right solution is conversion:

interval -> interval-string

and it means definitely other (new) code for to_char(interval). I think
useful for to_char(interval) is only format parser from formatting.c,
it's 5% of all to_char() code :-(

I don't think we want to maintain useless code in PG and answer every
month in PG lists questions "why doesn't work it?". It's better remove
it and wait for someone who write better implementation.

BTW, I have started work on formatting library:

http://people.redhat.com/kzak/libfmt/

contributors, volunteers? :-)

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, List pgsql-hackers <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Bug 1500
Date: 2005-03-26 01:32:46
Message-ID: 1111800766.2388.70.camel@petra
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-03-25 at 20:03 -0500, Tom Lane wrote:
> Steve Crawford <scrawford(at)pinpointresearch(dot)com> writes:
> > In digging around I discovered that it appears a decision was made to
> > remove to_char(interval) at the 8.1 release but I've been unable to
> > find the replacement for this functionality. This alarms me.
>
> Yeah. Karel Zak, who wrote that code, is convinced we should remove it,
> but I don't think anyone else is ...

I think I was Peter and Josh Berkus who convinced me that the code is
bed. "we should remove..." is opinion only...

http://groups-
beta.google.com/group/comp.databases.postgresql.hackers/browse_frm/thread/a43f02de8017cabb/c290bc55d5e1e6b2?q=to_char(interval)+done&rnum=1#c290bc55d5e1e6b2

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Steve Crawford <scrawford(at)pinpointresearch(dot)com>, Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, List pgsql-hackers <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Bug 1500
Date: 2005-03-26 01:53:16
Message-ID: 1111801996.2388.77.camel@petra
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2005-03-26 at 02:32 +0100, Karel Zak wrote:
> On Fri, 2005-03-25 at 20:03 -0500, Tom Lane wrote:
> > Steve Crawford <scrawford(at)pinpointresearch(dot)com> writes:
> > > In digging around I discovered that it appears a decision was made to
> > > remove to_char(interval) at the 8.1 release but I've been unable to
> > > find the replacement for this functionality. This alarms me.
> >
> > Yeah. Karel Zak, who wrote that code, is convinced we should remove it,
> > but I don't think anyone else is ...
>
> I think I was Peter and Josh Berkus who convinced me that the code is
> bed. "we should remove..." is opinion only...

s/bed/bad/ :-)

.. but my body dreams about bed, good night (morning?),

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Lyubomir Petrov <lpetrov(at)sysmaster(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug 1500
Date: 2005-03-26 03:31:17
Message-ID: 20050326033117.GA20688@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 25, 2005 at 12:53:53 -0800,
Steve Crawford <scrawford(at)pinpointresearch(dot)com> wrote:
>
> 2) If so, what is the replacement?

You should be able to use EXTRACT, some math to do your own formatting.
For common operations you can define SQL functions to do what you want.
Having to_char(interval) may be more convenient (if it does what you
want), but you can get by without it.