Re: [BUGS] surprising to_timestamp behavior

Lists: pgsql-bugspgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: surprising to_timestamp behavior
Date: 2013-10-29 15:24:49
Message-ID: CA+TgmobO2Jc2meW0tWpcNcX0Q3AoddcfWwKvBD+tp3VNPef_7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

It turns out that when you use the to_timestamp function, a space in
the format mask can result in skipping any character at all, even a
digit, in the input string. Consider this example, where 10 hours are
lost:

rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD HH24:MI:SS');
to_timestamp
------------------------
2013-10-29 00:47:18-04
(1 row)

There's already some code in the relevant function (DCH_from_char) to
skip whitespace characters, but that happens *in addition* to the
mandatory one-character skip. I suggest that we revamp the logic so
that we *either* skip whitespace *or* skip exactly one character from
the input string - but not both. That fixes this case, because the
first space in the mask matches the input space, and the second one is
allowed to match nothing instead of being forced to consume the "1"
following the space. See attached patch.

One problem with this is that it breaks the regression tests. Right
now, this works:

SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
'"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');

First, the quoted string matches exactly. Next, the subsequent space
in the format mask skips the colon in the input. And now it's trying
to match YYYY against " 1976", and it's happy to ignore that leading
space as part of decoding a numeric field. With the proposed patch,
we're no longer willing to match the colon in the input to the space
in the format mask, so it errors out, complaining that the colon
doesn't look like a number. I tend to think that's more correct than
what we're doing now.

The other regression test that fails is:

SELECT to_timestamp('15 "text between quote marks" 98 54 45',
E'HH24 "\\text between quote marks\\"" YY MI SS');

With the patch, this fails with:

ERROR: invalid value "rk" for "YY"

The problem is basically that we're quite happy to skip quoted text
that doesn't match at all, as long as it's the same length. And the
format string has extra cruft in it, so they're not, and we end up in
the wrong spot. Either I'm missing something, or our current behavior
here is nothing short of bizarre.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
to_timestamp_ws.patch text/x-patch 793 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: surprising to_timestamp behavior
Date: 2013-10-29 16:03:13
Message-ID: 2726.1383062593@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It turns out that when you use the to_timestamp function, a space in
> the format mask can result in skipping any character at all, even a
> digit, in the input string. Consider this example, where 10 hours are
> lost:

> rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD HH24:MI:SS');
> to_timestamp
> ------------------------
> 2013-10-29 00:47:18-04
> (1 row)

And that's a bug why? The format says to ignore two characters before the
hours field. I think you're proposing to remove important functionality.

To refine the point a bit, it's absolutely stupid to be using to_timestamp
at all for sane input data like this example. Just cast the string to
timestamp(tz), and the standard datatype input function will do a better
job than to_timestamp ever would. The point of to_timestamp, IMNSHO,
is to extract data successfully from weirdly formatted input; which might
well include cases where there are stray digits you don't want taken as
data. So I'm not on board with proposals to "fix" cases like this by
making the format string's meaning squishier.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: surprising to_timestamp behavior
Date: 2013-10-29 17:35:20
Message-ID: CA+TgmobE0DoqF8W7=AmSiiJMfDWg5LMZhE0TgG+cOEJ-JF=uug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Oct 29, 2013 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It turns out that when you use the to_timestamp function, a space in
>> the format mask can result in skipping any character at all, even a
>> digit, in the input string. Consider this example, where 10 hours are
>> lost:
>
>> rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD HH24:MI:SS');
>> to_timestamp
>> ------------------------
>> 2013-10-29 00:47:18-04
>> (1 row)
>
> And that's a bug why? The format says to ignore two characters before the
> hours field. I think you're proposing to remove important functionality.
>
> To refine the point a bit, it's absolutely stupid to be using to_timestamp
> at all for sane input data like this example. Just cast the string to
> timestamp(tz), and the standard datatype input function will do a better
> job than to_timestamp ever would. The point of to_timestamp, IMNSHO,
> is to extract data successfully from weirdly formatted input; which might
> well include cases where there are stray digits you don't want taken as
> data. So I'm not on board with proposals to "fix" cases like this by
> making the format string's meaning squishier.

Well, you're the second person to react that way to this proposal, but
the current behavior seems mighty odd to me - even odder, now that I
realize that we'll happily match '"cat'" to 'dog'. I just work here,
though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: surprising to_timestamp behavior
Date: 2013-10-31 05:20:58
Message-ID: CAM2+6=UooEhX+CMx+9CUAvKifqvhiT=XRn4KR5aF6Hhqj6Jmhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Oct 29, 2013 at 11:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Oct 29, 2013 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> It turns out that when you use the to_timestamp function, a space in
> >> the format mask can result in skipping any character at all, even a
> >> digit, in the input string. Consider this example, where 10 hours are
> >> lost:
> >
> >> rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD
> HH24:MI:SS');
> >> to_timestamp
> >> ------------------------
> >> 2013-10-29 00:47:18-04
> >> (1 row)
> >
> > And that's a bug why? The format says to ignore two characters before
> the
> > hours field. I think you're proposing to remove important functionality.
> >
> > To refine the point a bit, it's absolutely stupid to be using
> to_timestamp
> > at all for sane input data like this example. Just cast the string to
> > timestamp(tz), and the standard datatype input function will do a better
> > job than to_timestamp ever would. The point of to_timestamp, IMNSHO,
> > is to extract data successfully from weirdly formatted input; which might
> > well include cases where there are stray digits you don't want taken as
> > data. So I'm not on board with proposals to "fix" cases like this by
> > making the format string's meaning squishier.
>
> Well, you're the second person to react that way to this proposal, but
> the current behavior seems mighty odd to me - even odder, now that I
> realize that we'll happily match '"cat'" to 'dog'. I just work here,
> though.
>

Well, I agree with Tom that user provided two spaces to skip before hours
and this is what we are exactly doing.

Still here are few other observations:

(1) I don't see following as wrong output in postgresql as I already said
above and agreed with Tom. (in input, only one space between DD and HH24,
but
in mask we have 2 spaces)

postgres=# select to_timestamp('2011-03-18 23:38:15', 'YYYY-MM-DD
HH24:MI:SS');
to_timestamp
---------------------------
2011-03-18 03:38:15+05:30
(1 row)

(Note that, time 23 became 03, due to extra space in mask eating 2 in 23,
resulting in 3 for HH24. But fair enough, as expected and thus NO issues)

(2) But I see following buggy (both in input and mask we have 2 spaces
between DD and HH24)

postgres=# select to_timestamp('2011-03-18 23:38:15', 'YYYY-MM-DD
HH24:MI:SS');
to_timestamp
---------------------------
2011-03-18 03:38:15+05:30
(1 row)

(Note that, this time we should not end up with eating 2 from 23 as we have
exact spaces in mask and input. NOT so good and NOT expected, looks like
BUG)

So I think we need to resolve second case.

Thanks

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] surprising to_timestamp behavior
Date: 2014-01-14 07:24:18
Message-ID: CAM2+6=Uk4j0ibOGYO7rmePHwWLB5ZvXJ4iN7p5LHyY4mrn0OfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Oct 31, 2013 at 10:50 AM, Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

>
>
>
> On Tue, Oct 29, 2013 at 11:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com>wrote:
>
>> On Tue, Oct 29, 2013 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> >> It turns out that when you use the to_timestamp function, a space in
>> >> the format mask can result in skipping any character at all, even a
>> >> digit, in the input string. Consider this example, where 10 hours are
>> >> lost:
>> >
>> >> rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD
>> HH24:MI:SS');
>> >> to_timestamp
>> >> ------------------------
>> >> 2013-10-29 00:47:18-04
>> >> (1 row)
>> >
>> > And that's a bug why? The format says to ignore two characters before
>> the
>> > hours field. I think you're proposing to remove important
>> functionality.
>> >
>> > To refine the point a bit, it's absolutely stupid to be using
>> to_timestamp
>> > at all for sane input data like this example. Just cast the string to
>> > timestamp(tz), and the standard datatype input function will do a better
>> > job than to_timestamp ever would. The point of to_timestamp, IMNSHO,
>> > is to extract data successfully from weirdly formatted input; which
>> might
>> > well include cases where there are stray digits you don't want taken as
>> > data. So I'm not on board with proposals to "fix" cases like this by
>> > making the format string's meaning squishier.
>>
>> Well, you're the second person to react that way to this proposal, but
>> the current behavior seems mighty odd to me - even odder, now that I
>> realize that we'll happily match '"cat'" to 'dog'. I just work here,
>> though.
>>
>
> Well, I agree with Tom that user provided two spaces to skip before hours
> and this is what we are exactly doing.
>
> Still here are few other observations:
>
>
> (1) I don't see following as wrong output in postgresql as I already said
> above and agreed with Tom. (in input, only one space between DD and HH24,
> but
> in mask we have 2 spaces)
>
> postgres=# select to_timestamp('2011-03-18 23:38:15', 'YYYY-MM-DD
> HH24:MI:SS');
> to_timestamp
> ---------------------------
> 2011-03-18 03:38:15+05:30
> (1 row)
>
> (Note that, time 23 became 03, due to extra space in mask eating 2 in 23,
> resulting in 3 for HH24. But fair enough, as expected and thus NO issues)
>
>
> (2) But I see following buggy (both in input and mask we have 2 spaces
> between DD and HH24)
>
> postgres=# select to_timestamp('2011-03-18 23:38:15', 'YYYY-MM-DD
> HH24:MI:SS');
> to_timestamp
> ---------------------------
> 2011-03-18 03:38:15+05:30
> (1 row)
>
> (Note that, this time we should not end up with eating 2 from 23 as we have
> exact spaces in mask and input. NOT so good and NOT expected, looks like
> BUG)
>
> So I think we need to resolve second case.
>

Attached patch which fixes this issue.

I have just tweaked the code around ignoring spaces in DCH_from_char()
function.

Adding to CommitFest 2014-01 (Open).

Thanks

> Thanks
>
>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>>
>
> --
> Jeevan B Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] surprising to_timestamp behavior
Date: 2014-01-17 19:36:25
Message-ID: 9795.1389987385@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> writes:
> Attached patch which fixes this issue.

> I have just tweaked the code around ignoring spaces in DCH_from_char()
> function.

> Adding to CommitFest 2014-01 (Open).

I went to review this, and found that there's not actually a patch
attached ...

regards, tom lane


From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] surprising to_timestamp behavior
Date: 2014-01-20 06:23:58
Message-ID: CAM2+6=WGtL-_X0j9QJnYrHUWPxAzROzh_7HX3y1obKi7BXefDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>
> I went to review this, and found that there's not actually a patch
> attached ...
>
> regards, tom lane
>

Attached. Sorry for that.

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fix_surprising_to_timestamp_behavior.patch text/x-diff 4.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] surprising to_timestamp behavior
Date: 2014-01-20 18:53:33
Message-ID: 32524.1390244013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> writes:
>> I went to review this, and found that there's not actually a patch
>> attached ...

> Attached. Sorry for that.

This looks good to me except for one thing: if the upcoming node is a
DCH_FX action (ie, turn on fx_mode), I don't think we want to skip
whitespace. The original coding had that bug too, but it was only
exposed if the FX prefix was immediately preceded by whitespace in
the format.

It's a bit hard to think of useful cases where this would make a
difference, because it turns out that from_char_parse_int contains
an internal skip-whitespace action that's applied regardless of FX
mode, so it really only matters for non-integer field types. I
kinda think that maybe now we should remove that extra skip, or at
least count the skipped whitespace against the field width, but
did not experiment to see what the results would be.

Committed with that change and some cosmetic adjustments.

regards, tom lane