Re: Rectifying wrong Date outputs

Lists: pgsql-hackers
From: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Rectifying wrong Date outputs
Date: 2011-03-16 12:21:55
Message-ID: AANLkTinveA_yn5RJQofpJjBrq3mEydK6jvFbWbGphso=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I was randomly testing some date related stuff on PG & observed that the
outputs were wrong.

e.g.
postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
to_date
------------
3910-01-01 <--------- Look at this
(1 row)

postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY');
to_date
------------
2010-01-01
(1 row)

User can provide the year in any format i.e. 1-digit to 4-digit, in this
case the internal representation of century should be logically correct.

Considering this situation, I am drawing the table below & logging my
observation here. This might help us to demonstrate where we are lacking in
PG.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-1', 'DD-MON-Y') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YY') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YYY') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001*

In this case, all the cases seems correct. Also the YEAR part in the output
is seems logical.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-10', 'DD-MON-Y') 2010-01-01 Error *
*TO_DATE('01-jan-10', 'DD-MON-YY') 2010-01-01 01-JAN-2010*
*TO_DATE('01-jan-10', 'DD-MON-YYY') 2010-01-01 01-JAN-2010*
*TO_DATE('01-jan-10', 'DD-MON-YYYY') 0010-01-01 01-JAN-0010*

In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-067', 'DD-MON-Y') 2067-01-01 Error*
*TO_DATE('01-jan-111', 'DD-MON-YY') 2011-01-01 Error*
*TO_DATE('01-jan-678', 'DD-MON-YYY') 1678-01-01 01-JAN-2678*
*TO_DATE('01-jan-001', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001*

In this case, only last case seems correct in PG. Rest other cases are might
not be logical, rather the output is vague. In PG, I haven't seen any
document which is saying something like this, if year is 111...999 then the
century would be 2000 and 001...099 then then century would be 1000.
However, the 1st and 2nd case should throw an error since the output format
('Y' & 'YY') are really not matching with the Input ('067' & '111'),
respectively.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-2010', 'DD-MON-Y') 4010-01-01 Error*
*TO_DATE('01-jan-2010', 'DD-MON-YY') 3910-01-01 Error *
*TO_DATE('01-jan-2010', 'DD-MON-YYY') 3010-01-01 Error *
*TO_DATE('01-jan-2010', 'DD-MON-YYYY') 2010-01-01 01-JAN-2010*

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Again it should throw error in these cases, because Output Format
is not matching with Input Data. The exception here is the 2nd case, where
century is well-defined.

After observing the all above cases, the summary would be, if the output
format is lesser than the actual input value, then it should throw an error.
Considering this thumb rule, we can fix the wrong outputs in PG. I have made
necessary changes to the code & attaching the patch with this email.

In the patch, I have written one centralize function which will decide the
century depends upon the given Input format.

Thoughts ? Any other ideas on this ?

--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachment Content-Type Size
Fix_Century_V1_PG.patch text/x-patch 3.7 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-16 21:38:09
Message-ID: AANLkTikf--t2nGjAAimyxpbfO3ts06Lu+iBfSSeMYTo_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush(dot)newe(at)enterprisedb(dot)com> wrote:
> Data Format                 PostgreSQL EDBAS
> TO_DATE('01-jan-10',  'DD-MON-Y')         2010-01-01 Error
> TO_DATE('01-jan-10',  'DD-MON-YY')         2010-01-01 01-JAN-2010
> TO_DATE('01-jan-10',  'DD-MON-YYY') 2010-01-01 01-JAN-2010
> TO_DATE('01-jan-10',  'DD-MON-YYYY') 0010-01-01 01-JAN-0010
> In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
> 1st case the output is not correct since the Format ('Y') is lesser than the
> actual input ('10'). But PG is ignoring this condition and throwing whatever
> is input. The output year is might not be the year, what user is expecting.
> Hence PG should throw an error.

I can't get worked up about this. If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

> Data Format                 PostgreSQL EDBAS
> TO_DATE('01-jan-2010', 'DD-MON-Y') 4010-01-01 Error
> TO_DATE('01-jan-2010', 'DD-MON-YY') 3910-01-01 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYY') 3010-01-01 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYYY') 2010-01-01 01-JAN-2010

These cases look a lot stranger. I'm inclined to think that if the
number of digits specified exceeds the number of Ys, then we can
either (1) throw an error, as you suggest or (2) give the same answer
we would have given if the number of Ys were equal to the number of
digits given. In other words, if we're not going to error out here,
all of these should return 2010-01-01.

> Data Format PostgreSQL EDBAS
> TO_DATE('01-jan-067', 'DD-MON-Y') 2067-01-01 Error
> TO_DATE('01-jan-111', 'DD-MON-YY') 2011-01-01 Error
> TO_DATE('01-jan-678', 'DD-MON-YYY') 1678-01-01 01-JAN-2678
> TO_DATE('01-jan-001', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001

These are so strange that it's hard to reason about them; who uses
three-digit years? In the third case above, you should EDBAS
deciding that 678 means 2678 instead of 1678, but that seems quite
arbitrary. 1678 seems just as plausible. But the behavior in the
second case looks wrong (shouldn't the answer should be either 1111 or
2111?) and the first case looks inconsistent with the third one (why
does 067 mean 2067 rather than 1967 while 678 means 1678 rather than
2678?).

I'm inclined to think that we have a bug here in the case where the #
of digits given is greater than the # of Ys. See also this:

rhaas=# select to_date('01-jan-678', 'DD-MON-Y');
to_date
------------
2678-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YY');
to_date
------------
2578-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YYY');
to_date
------------
1678-01-01
(1 row)

It's a lot less clear to me that we have a bug in the other direction
(# of digits given is less than the # of Ys), but maybe....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-16 21:52:41
Message-ID: 201103162152.p2GLqfX12428@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
> <piyush(dot)newe(at)enterprisedb(dot)com> wrote:
> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
> > TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
> > TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
> > TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
> > TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
> > In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
> > 1st case the output is not correct since the Format ('Y') is lesser than the
> > actual input ('10'). But PG is ignoring this condition and throwing whatever
> > is input. The output year is might not be the year, what user is expecting.
> > Hence PG should throw an error.
>
> I can't get worked up about this. If there's a consensus that
> throwing an error here is better, fine, but on first blush the PG
> behavior doesn't look unreasonable to me.
>
> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar; with git HEAD:

test=> SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
to_date
------------
3910-01-01
(1 row)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-16 21:56:00
Message-ID: AANLkTi=wYrdkRZ5T6AqKHYMPD2TE4vNiJd2KGpfhdMup@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
>> <piyush(dot)newe(at)enterprisedb(dot)com> wrote:
>> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
>> > TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
>> > TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
>> > TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
>> > TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
>> > In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
>> > 1st case the output is not correct since the Format ('Y') is lesser than the
>> > actual input ('10'). But PG is ignoring this condition and throwing whatever
>> > is input. The output year is might not be the year, what user is expecting.
>> > Hence PG should throw an error.
>>
>> I can't get worked up about this.  If there's a consensus that
>> throwing an error here is better, fine, but on first blush the PG
>> behavior doesn't look unreasonable to me.
>>
>> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
>
> To clarify, the user is reporting EDB Advanced Server, though the
> community PG has the same issues, or at least similar;  with git HEAD:
>
>        test=> SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
>          to_date
>        ------------
>         3910-01-01
>        (1 row)

Actually, I think he's comparing PostgreSQL to Advanced Server.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-16 22:00:43
Message-ID: 201103162200.p2GM0hX13708@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
> >> <piyush(dot)newe(at)enterprisedb(dot)com> wrote:
> >> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
> >> > TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
> >> > TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
> >> > TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
> >> > TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
> >> > In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
> >> > 1st case the output is not correct since the Format ('Y') is lesser than the
> >> > actual input ('10'). But PG is ignoring this condition and throwing whatever
> >> > is input. The output year is might not be the year, what user is expecting.
> >> > Hence PG should throw an error.
> >>
> >> I can't get worked up about this. ?If there's a consensus that
> >> throwing an error here is better, fine, but on first blush the PG
> >> behavior doesn't look unreasonable to me.
> >>
> >> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
> >
> > To clarify, the user is reporting EDB Advanced Server, though the
> > community PG has the same issues, or at least similar; ?with git HEAD:
> >
> > ? ? ? ?test=> SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
> > ? ? ? ? ?to_date
> > ? ? ? ?------------
> > ? ? ? ? 3910-01-01
> > ? ? ? ?(1 row)
>
> Actually, I think he's comparing PostgreSQL to Advanced Server.

Oh, I understand now. I was confused that the headings didn't line up
with the values. I see now the first value is community PG and the
second is EDBAS.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 05:30:06
Message-ID: AANLkTimCej4qSjeat1hsh-3Bw+mWCXzQ8_1iqb3WuwLk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
Advanced Server.
Thanks Burce for clarification.

For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
document in PG which will explain what would be the century considered if it
is not given. If I missed out it somewhere please let me know.

I refer to following link which explains the behavior of Oracle.
http://forums.oracle.com/forums/thread.jspa?threadID=312239

Now, if
1. # of digits given is greater than the # of Ys

i.e.
postgres=# select to_date('01-jan-111', 'DD-MON-Y');
to_date
------------
2111-01-01
(1 row)

What we should do ? Either we should throw an error or we should give what
user has provided.
IMHO, we should follow what format is given by user. However, even if the
'format' gets wrong rather invalid, it is not throwing any error.

e.g.
postgres=# select to_date('01-jan-111', 'DD-MON-Y POSTGRES'); <<<<---- Look
at this
to_date
------------
2111-01-01
(1 row)

2. # of digits given is less than the # of Ys
Consider following case,

postgres=# select to_date('01-jan-6', 'DD-MON-Y'), to_date('01-jan-6',
'DD-MON-YYYY');
to_date | to_date
------------+------------
2006-01-01 | 0006-01-01

Why this behaviour not predictable ? I think we are always considering the
current century, if it is not provided. If I missed out any related
document, please share.

And yes,

postgres=# select to_date('01-jan-1761', 'DD-MON-Y');
to_date
------------
3761-01-01 <<<----- Look at this.
(1 row)

Definitely, their is a bug in this case.

Am I missing something ?

-Piyush

On Thu, Mar 17, 2011 at 3:30 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> Robert Haas wrote:
> > On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > Robert Haas wrote:
> > >> On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
> > >> <piyush(dot)newe(at)enterprisedb(dot)com> wrote:
> > >> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
> > >> > TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
> > >> > TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
> > >> > TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
> > >> > TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
> > >> > In this case, it seems in last 3 cases PG is behaving correctly.
> Whereas in
> > >> > 1st case the output is not correct since the Format ('Y') is lesser
> than the
> > >> > actual input ('10'). But PG is ignoring this condition and throwing
> whatever
> > >> > is input. The output year is might not be the year, what user is
> expecting.
> > >> > Hence PG should throw an error.
> > >>
> > >> I can't get worked up about this. ?If there's a consensus that
> > >> throwing an error here is better, fine, but on first blush the PG
> > >> behavior doesn't look unreasonable to me.
> > >>
> > >> > Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
> > >
> > > To clarify, the user is reporting EDB Advanced Server, though the
> > > community PG has the same issues, or at least similar; ?with git HEAD:
> > >
> > > ? ? ? ?test=> SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
> > > ? ? ? ? ?to_date
> > > ? ? ? ?------------
> > > ? ? ? ? 3910-01-01
> > > ? ? ? ?(1 row)
> >
> > Actually, I think he's comparing PostgreSQL to Advanced Server.
>
> Oh, I understand now. I was confused that the headings didn't line up
> with the values. I see now the first value is community PG and the
> second is EDBAS.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +
>

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 13:46:32
Message-ID: 1300369375-sup-8151@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
> Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
> Advanced Server.
> Thanks Burce for clarification.
>
> For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
> document in PG which will explain what would be the century considered if it
> is not given. If I missed out it somewhere please let me know.

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges. Some of it has been fixed,
but a lot of bugs remain. Looks like this is one of those places and it
seems appropriate to spend some time fixing it. Since it would involve
a behavior change, it should only go to 9.2, of course.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 14:09:56
Message-ID: AANLkTim9Ys8x8wy8gcYihdPdZHxrKtp19vr_mNUN+x=j@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
>> Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
>> Advanced Server.
>> Thanks Burce for clarification.
>>
>> For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
>> document in PG which will explain what would be the century considered if it
>> is not given. If I missed out it somewhere please let me know.
>
> Keep in mind that the datetime stuff was abandoned by the maintainer
> some years ago with quite some rough edges.  Some of it has been fixed,
> but a lot of bugs remain.  Looks like this is one of those places and it
> seems appropriate to spend some time fixing it.  Since it would involve
> a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits > # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 14:26:30
Message-ID: 11871.1300371990@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>> Keep in mind that the datetime stuff was abandoned by the maintainer
>> some years ago with quite some rough edges. Some of it has been fixed,
>> but a lot of bugs remain. Looks like this is one of those places and it
>> seems appropriate to spend some time fixing it. Since it would involve
>> a behavior change, it should only go to 9.2, of course.

> I wouldn't object to fixing the problem with # of digits > # of Ys in
> 9.1, if the fix is simple and clear-cut. I think we are still
> accepting patches to make minor tweaks, like the tab-completion patch
> I committed yesterday. It also doesn't bother me tremendously if we
> push it off, but I don't think that anyone's going to be too sad if
> TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
> sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1. I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

Having said that, it's not entirely clear to me what sane behavior is
here. Personally I would expect that an n-Ys format spec would consume
at most n digits from the input. Otherwise how are you going to use
to_date to pick apart strings that don't have any separators? So
I think the problem is actually upstream of the behavior complained of
here. However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 14:29:46
Message-ID: 1300372119-sup-2466@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of jue mar 17 11:09:56 -0300 2011:
> On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:

> > Keep in mind that the datetime stuff was abandoned by the maintainer
> > some years ago with quite some rough edges.  Some of it has been fixed,
> > but a lot of bugs remain.  Looks like this is one of those places and it
> > seems appropriate to spend some time fixing it.  Since it would involve
> > a behavior change, it should only go to 9.2, of course.
>
> I wouldn't object to fixing the problem with # of digits > # of Ys in
> 9.1, if the fix is simple and clear-cut. I think we are still
> accepting patches to make minor tweaks, like the tab-completion patch
> I committed yesterday. It also doesn't bother me tremendously if we
> push it off, but I don't think that anyone's going to be too sad if
> TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
> sensible than 3010-01-01.

If it can be delivered quickly and it is simple, sure. But anything
more involved should respect the release schedule.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Piyush Newe" <piyush(dot)newe(at)enterprisedb(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-17 14:40:27
Message-ID: 4D81D70B020000250003BA19@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> what we should first do is see what Oracle does in such cases,
> because the main driving factor for these functions is Oracle
> compatibility not what might seem sane in a vacuum.

+1

-Kevin


From: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 05:40:01
Message-ID: AANLkTi=72rJjsfuRCuUvg+3Y9t6bObDfdw38NrpCEwPZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
> > <alvherre(at)commandprompt(dot)com> wrote:
> >> Keep in mind that the datetime stuff was abandoned by the maintainer
> >> some years ago with quite some rough edges. Some of it has been fixed,
> >> but a lot of bugs remain. Looks like this is one of those places and it
> >> seems appropriate to spend some time fixing it. Since it would involve
> >> a behavior change, it should only go to 9.2, of course.
>
> > I wouldn't object to fixing the problem with # of digits > # of Ys in
> > 9.1, if the fix is simple and clear-cut. I think we are still
> > accepting patches to make minor tweaks, like the tab-completion patch
> > I committed yesterday. It also doesn't bother me tremendously if we
> > push it off, but I don't think that anyone's going to be too sad if
> > TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
> > sensible than 3010-01-01.
>
> Agreed, it's certainly not too late for bug fixes in 9.1. I agree
> that this isn't something we would want to tweak in released branches,
> but 9.1 isn't there yet.
>
>
I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.

> Having said that, it's not entirely clear to me what sane behavior is
> here. Personally I would expect that an n-Ys format spec would consume
> at most n digits from the input. Otherwise how are you going to use
> to_date to pick apart strings that don't have any separators? So
> I think the problem is actually upstream of the behavior complained of
> here. However, what we should first do is see what Oracle does in such
> cases, because the main driving factor for these functions is Oracle
> compatibility not what might seem sane in a vacuum.
>
>
Following is the extended chart which is comparing the behavior of Oracle,
PG & EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.

In all above mentioned cases, the observation is, If the # Ys are lesser
than the # of digits,, then it should throw an error. Only in case of 'YY',
its not correct, unless the year is later than 9999. In this way, we can fix
the wrong outputs in PG.

> regards, tom lane
>

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachment Content-Type Size
Fix_Century_V1_PG.patch text/x-patch 3.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 10:24:57
Message-ID: 4D872779.2070101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.03.2011 07:40, Piyush Newe wrote:
> On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>> On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
>>> <alvherre(at)commandprompt(dot)com> wrote:
>>>> Keep in mind that the datetime stuff was abandoned by the maintainer
>>>> some years ago with quite some rough edges. Some of it has been fixed,
>>>> but a lot of bugs remain. Looks like this is one of those places and it
>>>> seems appropriate to spend some time fixing it. Since it would involve
>>>> a behavior change, it should only go to 9.2, of course.
>>
>>> I wouldn't object to fixing the problem with # of digits> # of Ys in
>>> 9.1, if the fix is simple and clear-cut. I think we are still
>>> accepting patches to make minor tweaks, like the tab-completion patch
>>> I committed yesterday. It also doesn't bother me tremendously if we
>>> push it off, but I don't think that anyone's going to be too sad if
>>> TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
>>> sensible than 3010-01-01.
>>
>> Agreed, it's certainly not too late for bug fixes in 9.1. I agree
>> that this isn't something we would want to tweak in released branches,
>> but 9.1 isn't there yet.
>>
>>
> I feel the patch for the same would be easier and was attached in the
> initial mail of this mail thread. For your ready reference, I am attaching
> the same patch here again.
>
>
>> Having said that, it's not entirely clear to me what sane behavior is
>> here. Personally I would expect that an n-Ys format spec would consume
>> at most n digits from the input. Otherwise how are you going to use
>> to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

>> So
>> I think the problem is actually upstream of the behavior complained of
>> here. However, what we should first do is see what Oracle does in such
>> cases, because the main driving factor for these functions is Oracle
>> compatibility not what might seem sane in a vacuum.
>>
> Following is the extended chart which is comparing the behavior of Oracle,
> PG& EDBAS.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
>
> In this case, all the cases are in sync except the 1st one. I didn't
> understand why Oracle is interpreting year '1' as '2011'.

It makes sense to me. Year "1", when dat format is "Y", means the year
closest to current date that ends with 1. Or maybe the year that ends
with 1 in the current decade. This is analoguous to how two-digit years
are interpreted (except that we've hardcoded that the "current date" to
compare against is year 2000 - an assumption that will start to bite us
some time before year 2100).

> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error
> TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010
>
> In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
> Oracle is throwing error in 1st case since the Format ('Y') is lesser than
> the actual value ('10'). But PostgreSQL is ignoring this case and throwing
> whatever is input. The output is might not be the same was user is
> expecting.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error
> TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
> TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
> TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
>
> In this case, just last case was correct in PG. Rest other cases are not in
> sync with Oracle, rather the output is vague.
>
> *Data Format Oracle PostgreSQL EDBAS*
>
> TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error
> TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error
> TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
>
> In this case, PG is giving wrong output in first 3 cases. Those need to get
> rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
> the format is lesser than the actual value. It seems this rule is not
> applicable for 2nd case in Oracle.

Yeah, quite inconsistent :-(.

These results are not in favor of the idea that a format with n Ys
always consumess up to n digits from the input. With that rule,
to_date('01-jan-2010', 'DD-MON-YY') would return "01-JAN-2020", which
isn't what Oracle does and seems quite surprising too.

So ignoring the cases where Oracle throws an error but PostgreSQL
doesn't, there's four cases where the results differ:

> *Data Format Oracle PostgreSQL EDBAS*
> TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
> TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
> TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
> TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

IMHO our current behavior in 2nd and 4th case is so bizarre that we
should change them to match Oracle. I think we should fix the 1st too,
the notion that a single-digit year means something between 2000-2009
seems pretty useless (granted, using a single digit for year is
brain-dead to begin with).

The 3rd one is debatable. The range for three-digit years is currently
1100-2099, which is enough range for many applications. But should we
change it for the sake of matching Oracle's behavior? Not that anyone
uses YYY in practice, but still.

BTW, whatever behavior we choose, this needs to be documented. I don't
see anything in the docs on how Y, YY or YYY are expanded.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 13:39:41
Message-ID: AANLkTiny0De9Sx-P72HOcPjz=zWMYg6hh4QuLa-AwSo0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Having said that, it's not entirely clear to me what sane behavior is
>>> here.  Personally I would expect that an n-Ys format spec would consume
>>> at most n digits from the input.  Otherwise how are you going to use
>>> to_date to pick apart strings that don't have any separators?
>
> Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year? It doesn't seem unreasonable to allow YY to
emcompass what YYYY would have allowed, unless there's a separate
notion for 'either YY or YYYY'.

> It makes sense to me. Year "1", when dat format is "Y", means the year
> closest to current date that ends with 1. Or maybe the year that ends with 1
> in the current decade. This is analoguous to how two-digit years are
> interpreted (except that we've hardcoded that the "current date" to compare
> against is year 2000 - an assumption that will start to bite us some time
> before year 2100).

Agree with all of this.

> So ignoring the cases where Oracle throws an error but PostgreSQL doesn't,
> there's four cases where the results differ:
>
>> *Data Format Oracle PostgreSQL EDBAS*
>> TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
>> TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
>> TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
>> TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
>
> IMHO our current behavior in 2nd and 4th case is so bizarre that we should
> change them to match Oracle. I think we should fix the 1st too, the notion
> that a single-digit year means something between 2000-2009 seems pretty
> useless (granted, using a single digit for year is brain-dead to begin
> with).

I agree, but do we understand what Oracle does categorically, rather
than just its output on this specific input?

> The 3rd one is debatable. The range for three-digit years is currently
> 1100-2099, which is enough range for many applications. But should we change
> it for the sake of matching Oracle's behavior? Not that anyone uses YYY in
> practice, but still.

I'm OK with that, but again, exactly what rule is Oracle applying here?

> BTW, whatever behavior we choose, this needs to be documented. I don't see
> anything in the docs on how Y, YY or YYY are expanded.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 13:57:57
Message-ID: 20387.1300715877@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Having said that, it's not entirely clear to me what sane behavior is
>>> here. Personally I would expect that an n-Ys format spec would consume
>>> at most n digits from the input. Otherwise how are you going to use
>>> to_date to pick apart strings that don't have any separators?

>> Yeah, seems reasonable.

> On the flip side, what if you want to allow either a two digit year or
> a four digit year? It doesn't seem unreasonable to allow YY to
> emcompass what YYYY would have allowed, unless there's a separate
> notion for 'either YY or YYYY'.

What I was thinking was that YYYY would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

> I'm OK with that, but again, exactly what rule is Oracle applying here?

Yeah. Hopefully they documented it, and we don't have to try to
reverse-engineer the intention from an undersized set of samples.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 14:02:17
Message-ID: AANLkTimiSPRE6affMZWZ3ubzNFA_RpjPb5ZP8gE4qXU2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Having said that, it's not entirely clear to me what sane behavior is
>>>> here.  Personally I would expect that an n-Ys format spec would consume
>>>> at most n digits from the input.  Otherwise how are you going to use
>>>> to_date to pick apart strings that don't have any separators?
>
>>> Yeah, seems reasonable.
>
>> On the flip side, what if you want to allow either a two digit year or
>> a four digit year?  It doesn't seem unreasonable to allow YY to
>> emcompass what YYYY would have allowed, unless there's a separate
>> notion for 'either YY or YYYY'.
>
> What I was thinking was that YYYY would take either 2 or 4 digits.
> Whatever you do here, the year will have to be delimited by a non-digit
> for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
respectively, had been used.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 14:18:50
Message-ID: 20815.1300717130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What I was thinking was that YYYY would take either 2 or 4 digits.
>> Whatever you do here, the year will have to be delimited by a non-digit
>> for such cases to be parseable.

> I was assuming a slightly more general variant of that - namely, Y,
> YY, or YYY would all accept that many digits, or more; and the result
> of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
> respectively, had been used.

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Why not head in the other direction of allowing fewer digits than
suggested by the format, instead of more?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 15:43:48
Message-ID: AANLkTinTkm=-ixoOYPE-m_XKVuF5nqsjVEWWwEn2rXjs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> What I was thinking was that YYYY would take either 2 or 4 digits.
>>> Whatever you do here, the year will have to be delimited by a non-digit
>>> for such cases to be parseable.
>
>> I was assuming a slightly more general variant of that - namely, Y,
>> YY, or YYY would all accept that many digits, or more; and the result
>> of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
>> respectively, had been used.
>
> As far as I can see, that would completely destroy the use-case of
> trying to parse a string where there's not non-digit delimiters and
> so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it. If Piyush's table is to be believed, Oracle only
throws an error for the wrong number of digits if the format is Y or
YYY, and the actual number of digits is more. If the format is YY,
then it accepts 2, 3, or 4 digit years. And since YY is exponentially
more likely to be used than Y or YYY, that pretty much means you can't
do what you're talking about using this syntax anyway.

> Why not head in the other direction of allowing fewer digits than
> suggested by the format, instead of more?

Well, that seems a bit counterintuitive to me. I think it's much more
likely that someone wants to insist on a four-digit year (and not
allow just two digits) than that they want to insist on a two-digit
year (and not allow four digits). I also think that would be pretty
terrible for Oracle compatibility, since they're clearly interpreting
99 vs. YYYY as meaning either 0099, not 1999 or 2099. I don't think
we want to be randomly incompatible there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 16:39:24
Message-ID: 2701.1300725564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> As far as I can see, that would completely destroy the use-case of
>> trying to parse a string where there's not non-digit delimiters and
>> so you have to take exactly the specified number of digits, not more.

> Yeah, I thought about that, but it seems that use case is already
> hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
> worry about it.

How so?

regression=# select to_date('20110321', 'YYYYMMDD');
to_date
------------
2011-03-21
(1 row)

regression=# select to_date('110321', 'YYMMDD');
to_date
------------
2011-03-21
(1 row)

If you break the latter case, I am sure the villagers will be on your
doorstep shortly.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-21 17:05:25
Message-ID: AANLkTimqoujQm+9cYtiw6Lvzee9QtzqkUcBSpxmgF1HG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> As far as I can see, that would completely destroy the use-case of
>>> trying to parse a string where there's not non-digit delimiters and
>>> so you have to take exactly the specified number of digits, not more.
>
>> Yeah, I thought about that, but it seems that use case is already
>> hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
>> worry about it.
>
> How so?
>
> regression=# select to_date('20110321', 'YYYYMMDD');
>  to_date
> ------------
>  2011-03-21
> (1 row)
>
> regression=# select to_date('110321', 'YYMMDD');
>  to_date
> ------------
>  2011-03-21
> (1 row)
>
> If you break the latter case, I am sure the villagers will be on your
> doorstep shortly.

Oh, dear. No wonder this code is so hard to get right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-22 09:07:09
Message-ID: AANLkTin1R_hwGpyq788_JrsUDPJ2sR0SEbpcHLh=rx1z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks Heikki, Tom & Robert for your valuable inputs.

According to the code, PG is behaving what is mentioned below.

1. Format = Y
0 ... 9 = 2000 ... 2009 (we are always adding 2000 to the year)

2. Format = YY
00 ... 69 = 2000 ... 2069 (we are adding 2000 to the year)
70 ... 99 = 1970 ... 1999 (we are adding 1900 to the year)

3. Format = YYY
100 ... 999 = 1100 ... 1999 (we are adding 1000 to the year)
000 ... 099 = 2000 ... 2099 (we are adding 2000 to the year)

4. Format = YYYY
Any number -4712 to 9999

If we closely look at the code, we can say that, if the given INPUT value is
not in the range of the DESIRED format (i.e. Y/YY/YYY/YYYY), then it results
some weired YEAR.

e.g.
1. TO_DATE('01-jan-2010', 'DD-MON-Y')
Here it falls in the 1st format case i.e. "Y". As per the code, we are
blindly adding 2000 in the year value, hence the result is returned as
4010.

2. TO_DATE('01-jan-2010', 'DD-MON-YYY')
Here it falls in the 3rd case i.e. "YYY". As per the code, without checking
the input we are adding 1000 to the value & hence it results 3010.

IMHO, before deciding the century we should check what is the INPUT. This
check is missing in the code. As Heikki said, we really don't have such
document anywhere. We need to atleast document what we are doing. However,
if we are doing something vague then we need to decide what we are really
going to follow. We need to follow one standard, then it might be either
Oracle or something else.

It might happened that the given input contains the Century. In this case,
we should intelligent enough and decide not to add anything to the year. To
avoid this situation, I still feel we have to follow the format given by the
user. i.e. "if the number of digits specified exceeds the number of Y",
lets throw an error. This will make our life easier & relatively easier to
fix.

Robert,
If we follow the standard what Oracle is using, we will not break any case.

One more observation in Oracle

SQL> SELECT to_char(TO_DATE('01-jan-0001', 'DD-MON-YY'), 'DD-MON-YYYY')
from dual;

TO_CHAR(TO_DATE('01-
--------------------
01-JAN-0001

SQL> SELECT to_char(TO_DATE('01-jan-1', 'DD-MON-YY'), 'DD-MON-YYYY') from
dual;

TO_CHAR(TO_DATE('01-
--------------------
01-JAN-2001

If we observe this closely, in the later case, century is not given in the
input, hence Oracle is taking the current century. But the first case is bit
puzzling. Because if we convert the '0001' to number it is still interprets
as 1, then why it is not adding current century their ? I think it is
checking the INPUT & accordingly taking the decision.

Anyways, will it work if we check the input before deciding the century ?

On Mon, Mar 21, 2011 at 10:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> As far as I can see, that would completely destroy the use-case of
> >>> trying to parse a string where there's not non-digit delimiters and
> >>> so you have to take exactly the specified number of digits, not more.
> >
> >> Yeah, I thought about that, but it seems that use case is already
> >> hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
> >> worry about it.
> >
> > How so?
> >
> > regression=# select to_date('20110321', 'YYYYMMDD');
> > to_date
> > ------------
> > 2011-03-21
> > (1 row)
> >
> > regression=# select to_date('110321', 'YYMMDD');
> > to_date
> > ------------
> > 2011-03-21
> > (1 row)
> >
> > If you break the latter case, I am sure the villagers will be on your
> > doorstep shortly.
>
> Oh, dear. No wonder this code is so hard to get right.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


From: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rectifying wrong Date outputs
Date: 2011-03-25 04:25:51
Message-ID: AANLkTi=nvM2WtzG34c=9APi-yGFDr39cT5zVFkfXFmKO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Any inputs further ?

On Tue, Mar 22, 2011 at 2:37 PM, Piyush Newe
<piyush(dot)newe(at)enterprisedb(dot)com>wrote:

> Thanks Heikki, Tom & Robert for your valuable inputs.
>
> According to the code, PG is behaving what is mentioned below.
>
> 1. Format = Y
> 0 ... 9 = 2000 ... 2009 (we are always adding 2000 to the year)
>
> 2. Format = YY
> 00 ... 69 = 2000 ... 2069 (we are adding 2000 to the year)
> 70 ... 99 = 1970 ... 1999 (we are adding 1900 to the year)
>
> 3. Format = YYY
> 100 ... 999 = 1100 ... 1999 (we are adding 1000 to the year)
> 000 ... 099 = 2000 ... 2099 (we are adding 2000 to the year)
>
> 4. Format = YYYY
> Any number -4712 to 9999
>
> If we closely look at the code, we can say that, if the given INPUT value
> is not in the range of the DESIRED format (i.e. Y/YY/YYY/YYYY), then it
> results some weired YEAR.
>
> e.g.
> 1. TO_DATE('01-jan-2010', 'DD-MON-Y')
> Here it falls in the 1st format case i.e. "Y". As per the code, we are
> blindly adding 2000 in the year value, hence the result is returned as
> 4010.
>
> 2. TO_DATE('01-jan-2010', 'DD-MON-YYY')
> Here it falls in the 3rd case i.e. "YYY". As per the code, without checking
> the input we are adding 1000 to the value & hence it results 3010.
>
> IMHO, before deciding the century we should check what is the INPUT. This
> check is missing in the code. As Heikki said, we really don't have such
> document anywhere. We need to atleast document what we are doing. However,
> if we are doing something vague then we need to decide what we are really
> going to follow. We need to follow one standard, then it might be either
> Oracle or something else.
>
> It might happened that the given input contains the Century. In this case,
> we should intelligent enough and decide not to add anything to the year. To
> avoid this situation, I still feel we have to follow the format given by the
> user. i.e. "if the number of digits specified exceeds the number of Y",
> lets throw an error. This will make our life easier & relatively easier to
> fix.
>
> Robert,
> If we follow the standard what Oracle is using, we will not break any
> case.
>
> One more observation in Oracle
>
> SQL> SELECT to_char(TO_DATE('01-jan-0001', 'DD-MON-YY'), 'DD-MON-YYYY')
> from dual;
>
> TO_CHAR(TO_DATE('01-
> --------------------
> 01-JAN-0001
>
> SQL> SELECT to_char(TO_DATE('01-jan-1', 'DD-MON-YY'), 'DD-MON-YYYY') from
> dual;
>
> TO_CHAR(TO_DATE('01-
> --------------------
> 01-JAN-2001
>
> If we observe this closely, in the later case, century is not given in the
> input, hence Oracle is taking the current century. But the first case is bit
> puzzling. Because if we convert the '0001' to number it is still interprets
> as 1, then why it is not adding current century their ? I think it is
> checking the INPUT & accordingly taking the decision.
>
> Anyways, will it work if we check the input before deciding the century ?
>
>
> On Mon, Mar 21, 2011 at 10:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com>wrote:
>
>> On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> >> On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >>> As far as I can see, that would completely destroy the use-case of
>> >>> trying to parse a string where there's not non-digit delimiters and
>> >>> so you have to take exactly the specified number of digits, not more.
>> >
>> >> Yeah, I thought about that, but it seems that use case is already
>> >> hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
>> >> worry about it.
>> >
>> > How so?
>> >
>> > regression=# select to_date('20110321', 'YYYYMMDD');
>> > to_date
>> > ------------
>> > 2011-03-21
>> > (1 row)
>> >
>> > regression=# select to_date('110321', 'YYMMDD');
>> > to_date
>> > ------------
>> > 2011-03-21
>> > (1 row)
>> >
>> > If you break the latter case, I am sure the villagers will be on your
>> > doorstep shortly.
>>
>> Oh, dear. No wonder this code is so hard to get right.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> --
> Piyush S Newe
> Principal Engineer
> EnterpriseDB
> office: +91 20 3058 9500
> www.enterprisedb.com
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are not
> the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.
>
>

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-09-05 23:32:27
Message-ID: 201109052332.p85NWRn26034@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Piyush Newe wrote:
> Hi,
>
> I was randomly testing some date related stuff on PG & observed that the
> outputs were wrong.
>
> e.g.
> postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
> to_date
> ------------
> 3910-01-01 <--------- Look at this
> (1 row)
>
> postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY');
> to_date
> ------------
> 2010-01-01
> (1 row)

I have done some work on this problem, and have developed the attached
patch. It genarates the output in the final column of this table:

Oracle PostgreSQL With PG Patch
1 TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001+
2 TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
3 TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
4 TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
5 TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 01-JAN-2010
6 TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
7 TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
8 TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010
9 TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 01-JAN-2067
10 TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 01-JAN-2111*+
11 TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-1678+
12 TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
13 TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 01-JAN-2010*
14 TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 01-JAN-2010*
15 TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 01-JAN-2010*
16 TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

I marked with '*' every case where the patch doesn't match current PG,
and used a '+' to mark every case where it doesn't match Oracle.

I know Tom was worried that because the year field took more digits than
specified, it would prevent numeric columns from being pulled apart, but
our code has this check:

if (S_FM(node->suffix) || is_next_separator(node))
{
/*
* This node is in Fill Mode, or the next node is known to be a
* non-digit value, so we just slurp as many characters as we can get.
*/
errno = 0;
result = strtol(init, src, 10);
}

The reason these tests are accepting an unlimited number of digits is
because it is at the end of the string. If you place a digit field
right after it, it will not use more characters than specified:

test=> select to_date('9876', 'YYY');
to_date
------------
9876-01-01
(1 row)

test=> select to_date('9876', 'YYYMM');
to_date
------------
1987-06-01
(1 row)

Yes, not documented, but I assume the coder was trying to be helpful.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/date_era.diff text/x-diff 2.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-09-06 14:01:37
Message-ID: 201109061401.p86E1bL15981@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Piyush Newe wrote:
> > Hi,
> >
> > I was randomly testing some date related stuff on PG & observed that the
> > outputs were wrong.
> >
> > e.g.
> > postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
> > to_date
> > ------------
> > 3910-01-01 <--------- Look at this
> > (1 row)
> >
> > postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY');
> > to_date
> > ------------
> > 2010-01-01
> > (1 row)
>
> I have done some work on this problem, and have developed the attached
> patch. It genarates the output in the final column of this table:
>
> Oracle PostgreSQL With PG Patch
> 1 TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001+
> 2 TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> 3 TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> 4 TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
> 5 TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 01-JAN-2010
> 6 TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> 7 TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> 8 TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010
> 9 TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 01-JAN-2067
> 10 TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 01-JAN-2111*+
> 11 TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-1678+
> 12 TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
> 13 TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 01-JAN-2010*
> 14 TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 01-JAN-2010*
> 15 TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 01-JAN-2010*
> 16 TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In an attempt to make the to_date/to_timestamp behavior documentable, I
have modified the patch to have dates adjust toward the year 2020, and
added code so if four digits are supplied, we don't do any adjustment.
Here is the current odd behavior, which is fixed by the patch:

test=> select to_date('222', 'YYY');
to_date
------------
2222-01-01
(1 row)

test=> select to_date('0222', 'YYY');
to_date
------------
2222-01-01
(1 row)

If they supply a full 4-digit year, it seems we should honor that, even
for YYY. YYYY still does no adjustment, and I doubt we want to change
that:

test=> select to_date('222', 'YYYY');
to_date
------------
0222-01-01
(1 row)

test=> select to_date('0222', 'YYYY');
to_date
------------
0222-01-01
(1 row)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/date_era.diff text/x-diff 3.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Piyush Newe <piyush(dot)newe(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rectifying wrong Date outputs
Date: 2011-09-07 13:51:04
Message-ID: 201109071351.p87Dp4l26991@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Applied, with a function rename. The only odd case we have left is:

test=> select to_date('079', 'YYY');
to_date
------------
1979-01-01
(1 row)

(Note the zero is ignored.) I can't see an easy way to fix this and
continue to be easily documented.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Piyush Newe wrote:
> > > Hi,
> > >
> > > I was randomly testing some date related stuff on PG & observed that the
> > > outputs were wrong.
> > >
> > > e.g.
> > > postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
> > > to_date
> > > ------------
> > > 3910-01-01 <--------- Look at this
> > > (1 row)
> > >
> > > postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY');
> > > to_date
> > > ------------
> > > 2010-01-01
> > > (1 row)
> >
> > I have done some work on this problem, and have developed the attached
> > patch. It genarates the output in the final column of this table:
> >
> > Oracle PostgreSQL With PG Patch
> > 1 TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001+
> > 2 TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> > 3 TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
> > 4 TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
> > 5 TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 01-JAN-2010
> > 6 TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> > 7 TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
> > 8 TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010
> > 9 TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 01-JAN-2067
> > 10 TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 01-JAN-2111*+
> > 11 TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-1678+
> > 12 TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001
> > 13 TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 01-JAN-2010*
> > 14 TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 01-JAN-2010*
> > 15 TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 01-JAN-2010*
> > 16 TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
>
> In an attempt to make the to_date/to_timestamp behavior documentable, I
> have modified the patch to have dates adjust toward the year 2020, and
> added code so if four digits are supplied, we don't do any adjustment.
> Here is the current odd behavior, which is fixed by the patch:
>
> test=> select to_date('222', 'YYY');
> to_date
> ------------
> 2222-01-01
> (1 row)
>
> test=> select to_date('0222', 'YYY');
> to_date
> ------------
> 2222-01-01
> (1 row)
>
> If they supply a full 4-digit year, it seems we should honor that, even
> for YYY. YYYY still does no adjustment, and I doubt we want to change
> that:
>
> test=> select to_date('222', 'YYYY');
> to_date
> ------------
> 0222-01-01
> (1 row)
>
> test=> select to_date('0222', 'YYYY');
> to_date
> ------------
> 0222-01-01
> (1 row)
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +

[ text/x-diff is unsupported, treating like TEXT/PLAIN ]

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> new file mode 100644
> index c03dd6c..282bb0d
> *** a/doc/src/sgml/func.sgml
> --- b/doc/src/sgml/func.sgml
> *************** SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1
> *** 5550,5555 ****
> --- 5550,5564 ----
>
> <listitem>
> <para>
> + If the year format specification is less than four digits, e.g.
> + <literal>YYY</>, and the supplied year is less than four digits,
> + the year will be adjusted to be nearest to year 2020, e.g.
> + <literal>95</> becomes 1995.
> + </para>
> + </listitem>
> +
> + <listitem>
> + <para>
> The <literal>YYYY</literal> conversion from string to <type>timestamp</type> or
> <type>date</type> has a restriction when processing years with more than 4 digits. You must
> use some non-digit character or template after <literal>YYYY</literal>,
> diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
> new file mode 100644
> index 726a1f4..1a3ec1c
> *** a/src/backend/utils/adt/formatting.c
> --- b/src/backend/utils/adt/formatting.c
> *************** static void dump_node(FormatNode *node,
> *** 964,969 ****
> --- 964,970 ----
>
> static char *get_th(char *num, int type);
> static char *str_numth(char *dest, char *num, int type);
> + static int add_era_to_partial_year(int year);
> static int strspace_len(char *str);
> static int strdigits_len(char *str);
> static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
> *************** is_next_separator(FormatNode *n)
> *** 1968,1973 ****
> --- 1969,1999 ----
> return TRUE; /* some non-digit input (separator) */
> }
>
> +
> + static int
> + add_era_to_partial_year(int year)
> + {
> + /*
> + * Adjust all dates toward 2020; this is effectively what happens
> + * when we assume '70' is 1970 and '69' is 2069.
> + */
> + /* Force 0-69 into the 2000's */
> + if (year < 70)
> + return year + 2000;
> + /* Force 70-99 into the 1900's */
> + else if (year >= 70 && year < 100)
> + return year + 1900;
> + /* Force 100-519 into the 2000's */
> + else if (year >= 100 && year < 519)
> + return year + 2000;
> + /* Force 520-999 into the 1000's */
> + else if (year >= 520 && year < 1000)
> + return year + 1000;
> + else
> + return year;
> + }
> +
> +
> static int
> strspace_len(char *str)
> {
> *************** DCH_from_char(FormatNode *node, char *in
> *** 2930,2972 ****
> break;
> case DCH_YYY:
> case DCH_IYY:
> ! from_char_parse_int(&out->year, &s, n);
> out->yysz = 3;
> -
> - /*
> - * 3-digit year: '100' ... '999' = 1100 ... 1999 '000' ...
> - * '099' = 2000 ... 2099
> - */
> - if (out->year >= 100)
> - out->year += 1000;
> - else
> - out->year += 2000;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_YY:
> case DCH_IY:
> ! from_char_parse_int(&out->year, &s, n);
> out->yysz = 2;
> -
> - /*
> - * 2-digit year: '00' ... '69' = 2000 ... 2069 '70' ... '99'
> - * = 1970 ... 1999
> - */
> - if (out->year < 70)
> - out->year += 2000;
> - else
> - out->year += 1900;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_Y:
> case DCH_I:
> ! from_char_parse_int(&out->year, &s, n);
> out->yysz = 1;
> -
> - /*
> - * 1-digit year: always +2000
> - */
> - out->year += 2000;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_RM:
> --- 2956,2978 ----
> break;
> case DCH_YYY:
> case DCH_IYY:
> ! if (from_char_parse_int(&out->year, &s, n) < 4)
> ! out->year = add_era_to_partial_year(out->year);
> out->yysz = 3;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_YY:
> case DCH_IY:
> ! if (from_char_parse_int(&out->year, &s, n) < 4)
> ! out->year = add_era_to_partial_year(out->year);
> out->yysz = 2;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_Y:
> case DCH_I:
> ! if (from_char_parse_int(&out->year, &s, n) < 4)
> ! out->year = add_era_to_partial_year(out->year);
> out->yysz = 1;
> s += SKIP_THth(n->suffix);
> break;
> case DCH_RM:

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +