Re: [PATCHES] allow CSV quote in NULL

Lists: pgsql-hackerspgsql-patches
From: Stephen Frost <sfrost(at)snowman(dot)net>
To: pgsql-patches(at)postgresql(dot)org
Subject: allow CSV quote in NULL
Date: 2007-07-27 04:36:54
Message-ID: 20070727043653.GD4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greetings,

Please find attached a minor patch to remove the constraints that a
user can't include the delimiter or quote characters in a 'NULL AS'
string when importing CSV files.

This allows a user to explicitly request that NULL conversion happen
on fields which are quoted. As the quote character is being allowed
to be in the 'NULL AS' string now, there's no reason to exclude the
delimiter character from being seen in that string as well, though
unless quoted using the CSV quote character it won't ever be matched.

An example of the usage:

sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
with csv header quote as '"' null as '""'

This is no contrived example, it's an issue I ran into earlier today
when I got a file which had (for reasons unknown to me and not easily
changed upstream):

"1","V","WASHDCABC12","","120033"...

Both of the ending columns shown are integer fields, the "" here being
used to indicate a NULL value.

Without the patch, an ERROR occurs:

sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv
with csv header quote as '"'
ERROR: invalid input syntax for integer: ""

And there's no way to get it to import with COPY CSV mode. The
patch adds this ability without affecting existing usage or changing
the syntax. Even with the patch an ERROR occurs with the default
treatment of CSV files:

sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
with csv header quote as '"'
ERROR: invalid input syntax for integer: ""

Which would be expected. If the file is modified to remove the ""s
for NULL columns, it imports just fine with the syntax above.

It'd be really nice to have this included.

Thanks!

Stephen

Attachment Content-Type Size
pgsql.csv.diff text/x-diff 2.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: allow CSV quote in NULL
Date: 2007-07-27 04:51:56
Message-ID: 7386.1185511916@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Please find attached a minor patch to remove the constraints that a
> user can't include the delimiter or quote characters in a 'NULL AS'
> string when importing CSV files.

This can't really be sane can it?

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: allow CSV quote in NULL
Date: 2007-07-27 05:03:27
Message-ID: 20070727050327.GI4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > Please find attached a minor patch to remove the constraints that a
> > user can't include the delimiter or quote characters in a 'NULL AS'
> > string when importing CSV files.
>
> This can't really be sane can it?

hm? It's a problem I've run into a number of times and I finally got
fed up with it enough to create a patch for it. It solves my problem
and I don't believe breaks or adversely affects anything else. It also
has to be explicitly asked for by the user, and when the user asks for
it, what's done is intuitively what would be expected.

I don't see the logic in forbidding the user from being able to do
this... My specific case involved an integer field which was quoted in
the CSV file (which isn't generally a problem, it's only an issue if it
ends up being a an empty string). I could easily see it being used for
text fields as well though, if, for example, they want empty strings,
even when quoted, to import as NULLs. Of course, the default remains
that it'd be imported as an empty string, having it be a NULL has to be
explicitly requested.

Thanks,

Stephen


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephen Frost" <sfrost(at)snowman(dot)net>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: allow CSV quote in NULL
Date: 2007-07-27 09:22:42
Message-ID: 87vec6p5el.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> Please find attached a minor patch to remove the constraints that a
>> user can't include the delimiter or quote characters in a 'NULL AS'
>> string when importing CSV files.
>
> This can't really be sane can it?

Including unquoted delimiter characters seems kind of insane. But including
the quote characters or quoted delimiter characters seems reasonable.

The alternative would be interpreting NULL strings after dequoting but that
would leave no way to include the NULL string literally. This solution means
there's no way to include it (if it needs quoting) but only when you specify
it this way.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: allow CSV quote in NULL
Date: 2007-07-27 11:38:47
Message-ID: 20070727113847.GJ4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Gregory Stark (stark(at)enterprisedb(dot)com) wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Stephen Frost <sfrost(at)snowman(dot)net> writes:
> >> Please find attached a minor patch to remove the constraints that a
> >> user can't include the delimiter or quote characters in a 'NULL AS'
> >> string when importing CSV files.
> >
> > This can't really be sane can it?
>
> Including unquoted delimiter characters seems kind of insane. But including
> the quote characters or quoted delimiter characters seems reasonable.

Right. We could write something to check if the user provides an
unquoted delimiter character but I'm not really sure it's worth it, to
be perfectly honest. If it'll make people happier with the patch I can
do it though.

> The alternative would be interpreting NULL strings after dequoting but that
> would leave no way to include the NULL string literally. This solution means
> there's no way to include it (if it needs quoting) but only when you specify
> it this way.

Yeah, interpreting NULLs after dequoting means you've lost the
information about if it's quoted or not, or you have to add some funky
syntax to say "if it's quoted, do it differently...", which is no good,
imv.

What the patch does basically is say "give us the exact string that
shows up between the unquoted delimiters that you want to be treated
as a NULL." This removes the complexity of the question about quoting,
unquoting, whatever, and makes it a very clear-cut, straight-forward
solution with no impact on existing users, imv.

Thanks,

Stephen


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-27 16:03:19
Message-ID: 46AA1747.4030908@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


[redirecting to -hackers]

Stephen Frost wrote:
> * Gregory Stark (stark(at)enterprisedb(dot)com) wrote:
>
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>
>>
>>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>>>
>>>> Please find attached a minor patch to remove the constraints that a
>>>> user can't include the delimiter or quote characters in a 'NULL AS'
>>>> string when importing CSV files.
>>>>
>>> This can't really be sane can it?
>>>
>

Not very, no :-)
>
>> The alternative would be interpreting NULL strings after dequoting but that
>> would leave no way to include the NULL string literally. This solution means
>> there's no way to include it (if it needs quoting) but only when you specify
>> it this way.
>>
>
> Yeah, interpreting NULLs after dequoting means you've lost the
> information about if it's quoted or not, or you have to add some funky
> syntax to say "if it's quoted, do it differently...", which is no good,
> imv.
>
> What the patch does basically is say "give us the exact string that
> shows up between the unquoted delimiters that you want to be treated
> as a NULL." This removes the complexity of the question about quoting,
> unquoting, whatever, and makes it a very clear-cut, straight-forward
> solution with no impact on existing users, imv.
>
>
>

This looks too clever by half, to me. Someone facing the problem you are
facing would have to dig quite deep to find the solution you're promoting.

A much better way IMNSHO would be to add an extra FORCE switch. On
input, FORCE NOT NULL says to treat an unquoted null as the literal
value rather than as a null field for the columns named. The reverse
would be to tell it to treat a quoted null as null rather than as the
literal value, for the named columns. Perhaps that should just be "FORCE
NULL columnlist". It would be more explicit and at the same time would
only apply to the named columns, rather than discarding totally the
ability to distinguish between null and not null values.

This should probably be discussed on -hackers, anyway.

cheers

andrew


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-27 18:09:52
Message-ID: 20070727180951.GK4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
> This looks too clever by half, to me. Someone facing the problem you are
> facing would have to dig quite deep to find the solution you're promoting.

Oddly enough, it was one of the first things I tried when I discovered
it wasn't just realizing that ,"", for an integer column meant NULL (and
instead was complaining loudly that you can't convert an empty string
into an integer). It's also pretty clear, to me at least, to say
"put the exact string that shows up between the delimiters here
that you want treated as a NULL" rather than "well, if it's a column
which is quoted then you have to jump through these hoops and tell PG
about each one, but if it's not quoted you have to do this", etc, etc.

> A much better way IMNSHO would be to add an extra FORCE switch. On input,
> FORCE NOT NULL says to treat an unquoted null as the literal value rather
> than as a null field for the columns named. The reverse would be to tell it
> to treat a quoted null as null rather than as the literal value, for the
> named columns. Perhaps that should just be "FORCE NULL columnlist". It
> would be more explicit and at the same time would only apply to the named
> columns, rather than discarding totally the ability to distinguish between
> null and not null values.

I don't see that it needs to be 'more explicit', that's just silly.
Either the user indicated they want it, or they didn't. What you're
suggesting adds in a bunch of, imv, unnecessary complication and ends up
making the resulting code that much bigger and uglier for not much gain.

I'm honestly not a big fan of the "columnlist" approach that's been
taken with the options. While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.

We could instead flip it around and force the users to specify, for
each column, what, exactly, should be done for that column by having
them specify a regexp for that column. The regexp would implicitly have
the delimiter on each side of it and we'd just step through the string
matching as far as we can for each column. Then it's nice and explicit
for everyone but probably not much fun to use.

> This should probably be discussed on -hackers, anyway.

As a small, unobtrusive patch, I felt it didn't need a long discussion
about what everyone's CSV files look like and how "that just shouldn't
be done" or "that's just not sane."

Thanks,

Stephen


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-27 18:26:26
Message-ID: 46AA38D2.4070908@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stephen Frost wrote:
>
> I'm honestly not a big fan of the "columnlist" approach that's been
> taken with the options. While I understand the desire to seperate the
> parsing from the typing, making the users essentially do that association
> for us by way of making them specify how to handle each column explicitly
> is worse than just accepting that different types may need to be handled
> in different ways.
>

Whether or not you like it, the fact is it's there. I think any solution
should be consistent with what is done.

When CSV was first discussed we looked at doing type-specific behaviour.
The end of the long debate was that we simply couldn't do that safely,
and the only recourse was to require the user to specify the behaviour
required if it differed from the default. You might be inclined to want
to revisit that, but I am not.

cheers

andrew


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-27 19:19:17
Message-ID: 20070727191917.GL4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
> Stephen Frost wrote:
>> I'm honestly not a big fan of the "columnlist" approach that's been
>> taken with the options. While I understand the desire to seperate the
>> parsing from the typing, making the users essentially do that association
>> for us by way of making them specify how to handle each column explicitly
>> is worse than just accepting that different types may need to be handled
>> in different ways.
>>
>
> Whether or not you like it, the fact is it's there. I think any solution
> should be consistent with what is done.

Other, unrelated, options being or not being there doesn't really have
any bearing on this though. I'm not inventing new syntax here. I'm
just removing a restriction on what the user can do that doesn't need
to exist. Indeed, other more convoluted and complex things could still
be added, if someone wants them, this doesn't prevent that.

Thanks,

Stephen


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: allow CSV quote in NULL
Date: 2007-07-29 23:17:15
Message-ID: 20070729231715.GB4908@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > > Please find attached a minor patch to remove the constraints that a
> > > user can't include the delimiter or quote characters in a 'NULL AS'
> > > string when importing CSV files.
> >
> > This can't really be sane can it?
>
> hm? It's a problem I've run into a number of times and I finally got
> fed up with it enough to create a patch for it.

I've seen people in the same situation before, so this makes sense from
that perspective. If there's a vote, the patch gets a +1 from me.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-31 16:22:40
Message-ID: 25923.1185898960@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Other, unrelated, options being or not being there doesn't really have
> any bearing on this though. I'm not inventing new syntax here. I'm
> just removing a restriction on what the user can do that doesn't need
> to exist.

I don't think you're "just removing a restriction". What you're doing
is exposing a whole lot of strange and arguably broken corner cases.
If we accept this patch I think we'll be fielding bug reports as a
result for years to come. I *especially* dislike the part about
allowing the delimiter character in the null string --- that will allow
people to complain about the order in which decisions are made.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-31 16:55:35
Message-ID: 46AF6987.1080508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>
>> Other, unrelated, options being or not being there doesn't really have
>> any bearing on this though. I'm not inventing new syntax here. I'm
>> just removing a restriction on what the user can do that doesn't need
>> to exist.
>>
>
> I don't think you're "just removing a restriction". What you're doing
> is exposing a whole lot of strange and arguably broken corner cases.
> If we accept this patch I think we'll be fielding bug reports as a
> result for years to come. I *especially* dislike the part about
> allowing the delimiter character in the null string --- that will allow
> people to complain about the order in which decisions are made.
>
>
>

Yeah, if you allow the delimiter in the null string, what do you do if
it's not quoted? I can't imagine what the real world case for that could
possibly be.

Even if there's an arguable case for allowing the quote char in a null
string (and as I indicated upthread I really think the problem being
addressed here could be solved in a far better fashion) there is surely
no good case for allowing the delimiter. Oh, and if we did allow the
quote char we should surely only allow it on input - just because other
programs produce absurd output there is not reason we should.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-31 17:21:20
Message-ID: 26707.1185902480@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> ... Oh, and if we did allow the
> quote char we should surely only allow it on input - just because other
> programs produce absurd output there is not reason we should.

Yeah. The *real* problem with the patch as proposed is that it allows a
COPY OUT to emit a file that cannot be reloaded correctly, even given
the same options used to prepare it. I think that the restrictions were
put there more to prevent that scenario than to restrict COPY IN.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-31 18:52:39
Message-ID: 20070731185239.GP4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > ... Oh, and if we did allow the
> > quote char we should surely only allow it on input - just because other
> > programs produce absurd output there is not reason we should.
>
> Yeah. The *real* problem with the patch as proposed is that it allows a
> COPY OUT to emit a file that cannot be reloaded correctly, even given
> the same options used to prepare it. I think that the restrictions were
> put there more to prevent that scenario than to restrict COPY IN.

erp. My apologies, I hadn't ever intended for this to be used with COPY
OUT. For some reason I had thought my changes were isolated to the COPY
CSV IN path. I'd be happy to adjust the patch to only accept the
quote-in-null syntax when doing a COPY CSV IN.

Thanks,

Stephen


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: allow CSV quote in NULL
Date: 2007-09-26 08:38:54
Message-ID: 200709260838.l8Q8csB11970@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Stephen Frost wrote:
> Greetings,
>
> Please find attached a minor patch to remove the constraints that a
> user can't include the delimiter or quote characters in a 'NULL AS'
> string when importing CSV files.
>
> This allows a user to explicitly request that NULL conversion happen
> on fields which are quoted. As the quote character is being allowed
> to be in the 'NULL AS' string now, there's no reason to exclude the
> delimiter character from being seen in that string as well, though
> unless quoted using the CSV quote character it won't ever be matched.
>
> An example of the usage:
>
> sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
> with csv header quote as '"' null as '""'
>
> This is no contrived example, it's an issue I ran into earlier today
> when I got a file which had (for reasons unknown to me and not easily
> changed upstream):
>
> "1","V","WASHDCABC12","","120033"...
>
> Both of the ending columns shown are integer fields, the "" here being
> used to indicate a NULL value.
>
> Without the patch, an ERROR occurs:
>
> sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv
> with csv header quote as '"'
> ERROR: invalid input syntax for integer: ""
>
> And there's no way to get it to import with COPY CSV mode. The
> patch adds this ability without affecting existing usage or changing
> the syntax. Even with the patch an ERROR occurs with the default
> treatment of CSV files:
>
> sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
> with csv header quote as '"'
> ERROR: invalid input syntax for integer: ""
>
> Which would be expected. If the file is modified to remove the ""s
> for NULL columns, it imports just fine with the syntax above.
>
> It'd be really nice to have this included.
>
> Thanks!
>
> Stephen

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2008-03-11 22:47:54
Message-ID: 200803112247.m2BMlsO14982@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Added to TODO for COPY:

o Allow COPY in CSV mode to control whether "" is treated as NULL

http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
> [redirecting to -hackers]
>
> Stephen Frost wrote:
> > * Gregory Stark (stark(at)enterprisedb(dot)com) wrote:
> >
> >> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >>
> >>
> >>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> >>>
> >>>> Please find attached a minor patch to remove the constraints that a
> >>>> user can't include the delimiter or quote characters in a 'NULL AS'
> >>>> string when importing CSV files.
> >>>>
> >>> This can't really be sane can it?
> >>>
> >
>
> Not very, no :-)
> >
> >> The alternative would be interpreting NULL strings after dequoting but that
> >> would leave no way to include the NULL string literally. This solution means
> >> there's no way to include it (if it needs quoting) but only when you specify
> >> it this way.
> >>
> >
> > Yeah, interpreting NULLs after dequoting means you've lost the
> > information about if it's quoted or not, or you have to add some funky
> > syntax to say "if it's quoted, do it differently...", which is no good,
> > imv.
> >
> > What the patch does basically is say "give us the exact string that
> > shows up between the unquoted delimiters that you want to be treated
> > as a NULL." This removes the complexity of the question about quoting,
> > unquoting, whatever, and makes it a very clear-cut, straight-forward
> > solution with no impact on existing users, imv.
> >
> >
> >
>
> This looks too clever by half, to me. Someone facing the problem you are
> facing would have to dig quite deep to find the solution you're promoting.
>
> A much better way IMNSHO would be to add an extra FORCE switch. On
> input, FORCE NOT NULL says to treat an unquoted null as the literal
> value rather than as a null field for the columns named. The reverse
> would be to tell it to treat a quoted null as null rather than as the
> literal value, for the named columns. Perhaps that should just be "FORCE
> NULL columnlist". It would be more explicit and at the same time would
> only apply to the named columns, rather than discarding totally the
> ability to distinguish between null and not null values.
>
> This should probably be discussed on -hackers, anyway.
>
>
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +