From: | Brendan Jurd <direvus(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeremy Ford <jeremford(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1 |
Date: | 2009-06-19 15:33:51 |
Message-ID: | 37ed240d0906190833u19274495nae74a841fef19bd5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
2009/6/19 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> regression=# select to_date(' 2009 07', ' YYYYMM');
> to_date
> ------------
> 2009-07-01
> (1 row)
>
> However, that just begs the question --- it seems that leading space is
> allowed in MM, just not in YYYY. Brendan, is that intentional or is it
> a bug?
>
The reason the space between YYYY and MM is ignored isn't to do with
any special treatment of MM, rather it's to do with special treatment
of the end-of-string. Observe:
postgres=# select to_date('200906 19', 'YYYYMMDD');
to_date
------------
2009-06-19
(1 row)
What's going on here is that from_char_parse_int_len() has two
different approaches to capturing characters from the input. The
normal mode is to pull an exact number of characters out of the
string, as per the format node; for DD we pull 2 characters, for YYYY
we pull 4, and so on. However, if the FM (fill mode) flag is given,
or if the next node is known to be a non-digit character, we take a
much more tolerant approach and let strtol() grab as many characters
as it cares to. [1]
The reason for this technique is that it allows us to get away with
things like this:
postgres=# select to_date('2-6-2009', 'DD-MM-YYYY');
to_date
------------
2009-06-02
(1 row)
Or, to put it another way, the presence of separator characters trumps
a strict character-by-character interpretation of the format string.
The code treats the end-of-string as such a separator, so in your '
MM' example, the code lets strtol() off its leash and all the
remaining characters are fed into the month field.
This special treatment of separators was actually in the code long
before I got my hands on it, and I wasn't keen to change it -- I
feared that flipping that little quirk on its head would cause even
more breakage.
I hope that answers your question. to_date() is by nature a weird
beast with many strange corners in its behaviour, and it's hard to
strike a balance between backwards compatibility and Least
Astonishment. My personal preference would be for a 100% strict
interpretation of the format pattern, and a pox on anyone who has been
relying on sloppy patterns! But that's not very practical. I would
welcome any suggestions for further refinements.
Cheers,
BJ
[1] src/backend/utils/adt/formatting.c:1846
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-06-19 16:03:13 | Re: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1 |
Previous Message | Tom Lane | 2009-06-19 13:57:05 | Re: BUG #4865: replace function returns null |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2009-06-19 15:40:16 | Re: rc1 tarball contains partially outdated/missing man pages |
Previous Message | Peter Eisentraut | 2009-06-19 15:29:17 | Re: rc1 tarball contains partially outdated/missing man pages |