Patch: FORCE_NULL option for copy COPY in CSV mode

Lists: pgsql-hackers
From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-09-29 08:09:12
Message-ID: CAB8KJ=jS-Um4TGwenS5wLUfJK6K4rNOm_V6GRUj+tcKekL2=GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

This patch implements the following TODO item:

Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULL

Currently this is always treated as a zero-length string,
which generates an error when loading into an integer column

Re: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php

http://wiki.postgresql.org/wiki/Todo#COPY

I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.

I'll add this to the November commitfest.

Regards

Ian Barwick

Attachment Content-Type Size
copy-force-null-v1.patch application/octet-stream 10.0 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-05 11:38:38
Message-ID: CAA4eK1+zwtWZk7RgAcvR=UDyBab_HgEO5cfrLe-XaZRjSeit8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick(at)gmail(dot)com> wrote:
> Hi,
>
> This patch implements the following TODO item:
>
> Allow COPY in CSV mode to control whether a quoted zero-length
> string is treated as NULL
>
> Currently this is always treated as a zero-length string,
> which generates an error when loading into an integer column
>
> Re: [PATCHES] allow CSV quote in NULL
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
>
>
> http://wiki.postgresql.org/wiki/Todo#COPY
>
>
> I had a very definite use-case for this functionality recently while importing
> CSV files generated by Oracle, and was somewhat frustrated by the existence
> of a FORCE_NOT_NULL option for specific columns, but not one for
> FORCE_NULL.

While going through documentation of this patch to understand it's
usage, I found a small mistake.

+ Force the specified columns' values to be converted to <literal>NULL</>
+ if the value contains an empty string.

It seems quote after columns is wrong.

Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:

postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 50,
>> \.
postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-07 19:06:42
Message-ID: CA+TgmobWtajy_TJz5XDQ-XJy0QKoXgTcws3W5_g2sEC3bOHhMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 5, 2013 at 7:38 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick(at)gmail(dot)com> wrote:
>> Hi,
>>
>> This patch implements the following TODO item:
>>
>> Allow COPY in CSV mode to control whether a quoted zero-length
>> string is treated as NULL
>>
>> Currently this is always treated as a zero-length string,
>> which generates an error when loading into an integer column
>>
>> Re: [PATCHES] allow CSV quote in NULL
>> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
>>
>>
>> http://wiki.postgresql.org/wiki/Todo#COPY
>>
>>
>> I had a very definite use-case for this functionality recently while importing
>> CSV files generated by Oracle, and was somewhat frustrated by the existence
>> of a FORCE_NOT_NULL option for specific columns, but not one for
>> FORCE_NULL.
>
> While going through documentation of this patch to understand it's
> usage, I found a small mistake.
>
> + Force the specified columns' values to be converted to <literal>NULL</>
> + if the value contains an empty string.
>
> It seems quote after columns is wrong.

That's a correct plural possessive in English, but in might be better
worded as "Force any empty string encountered in the input for the
specified columns to be interpreted as a NULL value."

> Also if your use case is to treat empty strings as NULL (as per above
> documentation), can't it be handled with "WITH NULL AS" option.
> For example, something like:
>
> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself.
>>> 50,
>>> \.
> postgres=# select * from testnull;
> a | b
> ----+------
> 50 | NULL
> (1 row)

Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-07 19:25:14
Message-ID: 52530A9A.6060903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/07/2013 03:06 PM, Robert Haas wrote:
>
>> Also if your use case is to treat empty strings as NULL (as per above
>> documentation), can't it be handled with "WITH NULL AS" option.
>> For example, something like:
>>
>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>> Enter data to be copied followed by a newline.
>> End with a backslash and a period on a line by itself.
>>>> 50,
>>>> \.
>> postgres=# select * from testnull;
>> a | b
>> ----+------
>> 50 | NULL
>> (1 row)
> Good point. If this patch is just implementing something that can
> already be done with another syntax, we don't need it.
>

Isn't the point of this option to allow a *quoted* empty string to be
forced to NULL? If so, this is not testing the same case - in fact the
COPY command above just makes explicit the default CSV NULL setting anyway.

cheers

andrew


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 03:34:25
Message-ID: CAA4eK1+6sGsk+YGJgFENC-g5OOMnt3-2UZSLGJ3iqcyWwgxQdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>
>>
>>> Also if your use case is to treat empty strings as NULL (as per above
>>> documentation), can't it be handled with "WITH NULL AS" option.
>>> For example, something like:
>>>
>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>> Enter data to be copied followed by a newline.
>>> End with a backslash and a period on a line by itself.
>>>>>
>>>>> 50,
>>>>> \.
>>>
>>> postgres=# select * from testnull;
>>> a | b
>>> ----+------
>>> 50 | NULL
>>> (1 row)
>>
>> Good point. If this patch is just implementing something that can
>> already be done with another syntax, we don't need it.
>>
>
>
> Isn't the point of this option to allow a *quoted* empty string to be forced
> to NULL? If so, this is not testing the same case - in fact the COPY command
> above just makes explicit the default CSV NULL setting anyway.

I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.

By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 12:33:13
Message-ID: 5253FB89.7090004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/07/2013 11:34 PM, Amit Kapila wrote:
> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>
>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>> For example, something like:
>>>>
>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>> Enter data to be copied followed by a newline.
>>>> End with a backslash and a period on a line by itself.
>>>>>> 50,
>>>>>> \.
>>>> postgres=# select * from testnull;
>>>> a | b
>>>> ----+------
>>>> 50 | NULL
>>>> (1 row)
>>> Good point. If this patch is just implementing something that can
>>> already be done with another syntax, we don't need it.
>>>
>>
>> Isn't the point of this option to allow a *quoted* empty string to be forced
>> to NULL? If so, this is not testing the same case - in fact the COPY command
>> above just makes explicit the default CSV NULL setting anyway.
> I am really not sure if all the purpose of patch can be achieved by
> existing syntax, neither it is explained clearly.
> However the proposal hasn't discussed why it's not good idea to extend
> some similar syntax "COPY .. NULL" which is used to replace string
> with NULL's?
> Description of NULL says: "Specifies the string that represents a null value."
> Now why can't this syntax be extended to support quoted empty string
> if it's not supported currently?
> I have not checked completely, If it's difficult or not possible to
> support in existing syntax, then even it add's more value to introduce
> new syntax.
>
> By asking above question, I doesn't mean that we should not go for the
> new proposed syntax, rather it's to know and understand the benefit of
> new syntax, also it helps during CF review for reviewer's if the
> proposal involves new syntax and that's discussed previously.
>

Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch
allows you to specify the behaviour very explicitly on a per column
basis. I can well imagine wanting to be able to force a quoted empty
string to null for numeric fields but not for text.

The basic principal of our CSV processing is that we don't ever turn a
NULL into something quoted and we don't ever turn something quoted into
NULL. That's what lets us round-trip test just about every combination
of options. I'm only going to be happy violating that, as this patch
does, in a very explicit and controlled way.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 16:30:43
Message-ID: CA+Tgmobhkw_nd6GPQP8RGSRnWSBLHdSuf8UHQPja0iwqGHBk9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>> wrote:
>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>> For example, something like:
>>>>>
>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>> Enter data to be copied followed by a newline.
>>>>> End with a backslash and a period on a line by itself.
>>>>>>>
>>>>>>> 50,
>>>>>>> \.
>>>>>
>>>>> postgres=# select * from testnull;
>>>>> a | b
>>>>> ----+------
>>>>> 50 | NULL
>>>>> (1 row)
>>>>
>>>> Good point. If this patch is just implementing something that can
>>>> already be done with another syntax, we don't need it.
>>>>
>>>
>>> Isn't the point of this option to allow a *quoted* empty string to be
>>> forced
>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>> command
>>> above just makes explicit the default CSV NULL setting anyway.
>>
>> I am really not sure if all the purpose of patch can be achieved by
>> existing syntax, neither it is explained clearly.
>> However the proposal hasn't discussed why it's not good idea to extend
>> some similar syntax "COPY .. NULL" which is used to replace string
>> with NULL's?
>> Description of NULL says: "Specifies the string that represents a null
>> value."
>> Now why can't this syntax be extended to support quoted empty string
>> if it's not supported currently?
>> I have not checked completely, If it's difficult or not possible to
>> support in existing syntax, then even it add's more value to introduce
>> new syntax.
>>
>> By asking above question, I doesn't mean that we should not go for the
>> new proposed syntax, rather it's to know and understand the benefit of
>> new syntax, also it helps during CF review for reviewer's if the
>> proposal involves new syntax and that's discussed previously.
>>
>
> Quite apart from any other consideration, this suggestion is inferior to
> what's proposed in that it's an all or nothing deal, while the patch allows
> you to specify the behaviour very explicitly on a per column basis. I can
> well imagine wanting to be able to force a quoted empty string to null for
> numeric fields but not for text.
>
> The basic principal of our CSV processing is that we don't ever turn a NULL
> into something quoted and we don't ever turn something quoted into NULL.
> That's what lets us round-trip test just about every combination of options.
> I'm only going to be happy violating that, as this patch does, in a very
> explicit and controlled way.

Andrew, are you planning to review & commit this?

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 16:47:14
Message-ID: 52543712.2090605@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/08/2013 12:30 PM, Robert Haas wrote:
> On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>>> For example, something like:
>>>>>>
>>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>>> Enter data to be copied followed by a newline.
>>>>>> End with a backslash and a period on a line by itself.
>>>>>>>> 50,
>>>>>>>> \.
>>>>>> postgres=# select * from testnull;
>>>>>> a | b
>>>>>> ----+------
>>>>>> 50 | NULL
>>>>>> (1 row)
>>>>> Good point. If this patch is just implementing something that can
>>>>> already be done with another syntax, we don't need it.
>>>>>
>>>> Isn't the point of this option to allow a *quoted* empty string to be
>>>> forced
>>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>>> command
>>>> above just makes explicit the default CSV NULL setting anyway.
>>> I am really not sure if all the purpose of patch can be achieved by
>>> existing syntax, neither it is explained clearly.
>>> However the proposal hasn't discussed why it's not good idea to extend
>>> some similar syntax "COPY .. NULL" which is used to replace string
>>> with NULL's?
>>> Description of NULL says: "Specifies the string that represents a null
>>> value."
>>> Now why can't this syntax be extended to support quoted empty string
>>> if it's not supported currently?
>>> I have not checked completely, If it's difficult or not possible to
>>> support in existing syntax, then even it add's more value to introduce
>>> new syntax.
>>>
>>> By asking above question, I doesn't mean that we should not go for the
>>> new proposed syntax, rather it's to know and understand the benefit of
>>> new syntax, also it helps during CF review for reviewer's if the
>>> proposal involves new syntax and that's discussed previously.
>>>
>> Quite apart from any other consideration, this suggestion is inferior to
>> what's proposed in that it's an all or nothing deal, while the patch allows
>> you to specify the behaviour very explicitly on a per column basis. I can
>> well imagine wanting to be able to force a quoted empty string to null for
>> numeric fields but not for text.
>>
>> The basic principal of our CSV processing is that we don't ever turn a NULL
>> into something quoted and we don't ever turn something quoted into NULL.
>> That's what lets us round-trip test just about every combination of options.
>> I'm only going to be happy violating that, as this patch does, in a very
>> explicit and controlled way.
> Andrew, are you planning to review & commit this?
>

Yes.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 18:38:51
Message-ID: 5254513B.4060000@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
>
> On 10/08/2013 12:30 PM, Robert Haas wrote:
>> Andrew, are you planning to review & commit this?
>>
>
> Yes.
>
>

Incidentally, the patch doesn't seem to add the option to file_fdw,
which I really think it should.

cheers

andrew


From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-08 22:47:17
Message-ID: CAB8KJ=jpk9YDG88ufYp4xKL9scfj1=hQVx-dNWEo8hbH0+i=ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/10/9 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
>>
>>
>> On 10/08/2013 12:30 PM, Robert Haas wrote:
>>>
>>> Andrew, are you planning to review & commit this?
>>>
>>
>> Yes.
>>
>>
>
> Incidentally, the patch doesn't seem to add the option to file_fdw, which I
> really think it should.

Patch author here. I'll dig out the use-case I had for this patch and have a
look at the file_fdw option, which never occurred to me. (I'm
doing some FDW-related stuff at the moment so would be more than happy
to handle that too).

(Sorry for the delay in following up on this, I kind of assumed the patch
would be squirrelled away until the next commitfest ;) )

Regards

Ian Barwick


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 03:45:39
Message-ID: CAA4eK1KO5T9zgehmMzpPSqy0bhMLrZBsd2zzjVPw-UPg7DT3Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>
>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>> wrote:
>>>
>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>
>>>>
>>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>> For example, something like:
>>>>>
>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>> Enter data to be copied followed by a newline.
>>>>> End with a backslash and a period on a line by itself.
>>>>>>>
>>>>>>> 50,
>>>>>>> \.
>>>>>
>>>>> postgres=# select * from testnull;
>>>>> a | b
>>>>> ----+------
>>>>> 50 | NULL
>>>>> (1 row)
>>>>
>>>> Good point. If this patch is just implementing something that can
>>>> already be done with another syntax, we don't need it.
>>>>
>>>
>>> Isn't the point of this option to allow a *quoted* empty string to be
>>> forced
>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>> command
>>> above just makes explicit the default CSV NULL setting anyway.
>>
>> I am really not sure if all the purpose of patch can be achieved by
>> existing syntax, neither it is explained clearly.
>> However the proposal hasn't discussed why it's not good idea to extend
>> some similar syntax "COPY .. NULL" which is used to replace string
>> with NULL's?
>> Description of NULL says: "Specifies the string that represents a null
>> value."
>> Now why can't this syntax be extended to support quoted empty string
>> if it's not supported currently?
>> I have not checked completely, If it's difficult or not possible to
>> support in existing syntax, then even it add's more value to introduce
>> new syntax.
>>
>> By asking above question, I doesn't mean that we should not go for the
>> new proposed syntax, rather it's to know and understand the benefit of
>> new syntax, also it helps during CF review for reviewer's if the
>> proposal involves new syntax and that's discussed previously.
>>
>
> Quite apart from any other consideration, this suggestion is inferior to
> what's proposed in that it's an all or nothing deal, while the patch allows
> you to specify the behaviour very explicitly on a per column basis. I can
> well imagine wanting to be able to force a quoted empty string to null for
> numeric fields but not for text.

> The basic principal of our CSV processing is that we don't ever turn a NULL
> into something quoted and we don't ever turn something quoted into NULL.
> That's what lets us round-trip test just about every combination of options.
> I'm only going to be happy violating that, as this patch does, in a very
> explicit and controlled way.

Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?
It seems from documentation that current option FORCE_NOT_NULL handles
for both (Do not match the specified columns' values against the null
string. In the default case where the null string is empty, this means
that empty values will be read as zero-length strings rather than
nulls, "even when they are not quoted.").

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 13:22:31
Message-ID: CAA4eK1KbkGvnwFv-E7uw7zQveAaRweW3m5k1W8nmtOYhkBFbgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>>
>>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>>
>>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>>
>>>>>
>>>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>>> For example, something like:
>>>>>>
>>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>>> Enter data to be copied followed by a newline.
>>>>>> End with a backslash and a period on a line by itself.
>>>>>>>>
>>>>>>>> 50,
>>>>>>>> \.
>>>>>>
>>>>>> postgres=# select * from testnull;
>>>>>> a | b
>>>>>> ----+------
>>>>>> 50 | NULL
>>>>>> (1 row)
>>>>>
>>>>> Good point. If this patch is just implementing something that can
>>>>> already be done with another syntax, we don't need it.
>>>>>
>>>>
>>>> Isn't the point of this option to allow a *quoted* empty string to be
>>>> forced
>>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>>> command
>>>> above just makes explicit the default CSV NULL setting anyway.
>>>
>>> I am really not sure if all the purpose of patch can be achieved by
>>> existing syntax, neither it is explained clearly.
>>> However the proposal hasn't discussed why it's not good idea to extend
>>> some similar syntax "COPY .. NULL" which is used to replace string
>>> with NULL's?
>>> Description of NULL says: "Specifies the string that represents a null
>>> value."
>>> Now why can't this syntax be extended to support quoted empty string
>>> if it's not supported currently?
>>> I have not checked completely, If it's difficult or not possible to
>>> support in existing syntax, then even it add's more value to introduce
>>> new syntax.
>>>
>>> By asking above question, I doesn't mean that we should not go for the
>>> new proposed syntax, rather it's to know and understand the benefit of
>>> new syntax, also it helps during CF review for reviewer's if the
>>> proposal involves new syntax and that's discussed previously.
>>>
>>
>> Quite apart from any other consideration, this suggestion is inferior to
>> what's proposed in that it's an all or nothing deal, while the patch allows
>> you to specify the behaviour very explicitly on a per column basis. I can
>> well imagine wanting to be able to force a quoted empty string to null for
>> numeric fields but not for text.

Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.

>> The basic principal of our CSV processing is that we don't ever turn a NULL
>> into something quoted and we don't ever turn something quoted into NULL.
>> That's what lets us round-trip test just about every combination of options.
>> I'm only going to be happy violating that, as this patch does, in a very
>> explicit and controlled way.
>
> Will this option allow only quoted empty string to be NULL or will
> handle without quoted empty string as well?

I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.

I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 13:52:29
Message-ID: 52555F9D.8040505@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 09:22 AM, Amit Kapila wrote:
> On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>> wrote:
>>>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>>>
>>>>>>> Also if your use case is to treat empty strings as NULL (as per above
>>>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>>>> For example, something like:
>>>>>>>
>>>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>>>> Enter data to be copied followed by a newline.
>>>>>>> End with a backslash and a period on a line by itself.
>>>>>>>>> 50,
>>>>>>>>> \.
>>>>>>> postgres=# select * from testnull;
>>>>>>> a | b
>>>>>>> ----+------
>>>>>>> 50 | NULL
>>>>>>> (1 row)
>>>>>> Good point. If this patch is just implementing something that can
>>>>>> already be done with another syntax, we don't need it.
>>>>>>
>>>>> Isn't the point of this option to allow a *quoted* empty string to be
>>>>> forced
>>>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>>>> command
>>>>> above just makes explicit the default CSV NULL setting anyway.
>>>> I am really not sure if all the purpose of patch can be achieved by
>>>> existing syntax, neither it is explained clearly.
>>>> However the proposal hasn't discussed why it's not good idea to extend
>>>> some similar syntax "COPY .. NULL" which is used to replace string
>>>> with NULL's?
>>>> Description of NULL says: "Specifies the string that represents a null
>>>> value."
>>>> Now why can't this syntax be extended to support quoted empty string
>>>> if it's not supported currently?
>>>> I have not checked completely, If it's difficult or not possible to
>>>> support in existing syntax, then even it add's more value to introduce
>>>> new syntax.
>>>>
>>>> By asking above question, I doesn't mean that we should not go for the
>>>> new proposed syntax, rather it's to know and understand the benefit of
>>>> new syntax, also it helps during CF review for reviewer's if the
>>>> proposal involves new syntax and that's discussed previously.
>>>>
>>> Quite apart from any other consideration, this suggestion is inferior to
>>> what's proposed in that it's an all or nothing deal, while the patch allows
>>> you to specify the behaviour very explicitly on a per column basis. I can
>>> well imagine wanting to be able to force a quoted empty string to null for
>>> numeric fields but not for text.
> Okay, but can't it be done by extending current syntax such as
> NULL FOR <col1, col2> AS ""- which would mean it will replace
> corresponding column's values as NULL if they contain empty string.
>
>>> The basic principal of our CSV processing is that we don't ever turn a NULL
>>> into something quoted and we don't ever turn something quoted into NULL.
>>> That's what lets us round-trip test just about every combination of options.
>>> I'm only going to be happy violating that, as this patch does, in a very
>>> explicit and controlled way.
>> Will this option allow only quoted empty string to be NULL or will
>> handle without quoted empty string as well?
> I had checked patch and it seems for both quoted and unquoted empty
> string, it will replace it with NULL.
> Now here it's bit different from FORCE_NOT_NULL, because already
> for un-quoted empty strings we have a way to replace them with NULL.
>
> I think having 2 different syntax for replacing empty strings (one
> for quoted and another for un-quoted) as NULL's might not be best way
> to accomplish
> this feature.
>
>

I really don't know what you're saying here.

Here is the situation we have today (assuming the default null marker of
empty-string):

default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string
-> emptystring

and the proposal would add to that:

with force-null: empty-string -> null, quoted-empty-string -> null

So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.

I don't see at all that your suggested alternative has any advantages
over what's been written. If you can say "NULL FOR (foo) as '""' how
will you specify the null for some other column(s)? Are we going to have
multiple such clauses? It looks like a real mess.

cheers

andrew


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 15:23:46
Message-ID: m28uy2bh8d.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I don't see at all that your suggested alternative has any advantages over
> what's been written. If you can say "NULL FOR (foo) as '""' how will you
> specify the null for some other column(s)? Are we going to have multiple
> such clauses? It looks like a real mess.

Basically the CSV files don't have out-of-band NULLs and it's then a
real mess. In the new pgloader version I've been adding per-column NULL
processing, where NULL can be either an empty string, any number of
space characters or any constant string such as "\N" or "****".

I first added a global per-file NULL representation setting, but that's
not flexible enough to make any sense really. The files we have to
import are way to "creative" in their formats.

In my view, we can slowly deprecate pgloader by including such features
in the core code or make pgloader and the like non-optional parts of
external data loading tool chain.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 15:53:26
Message-ID: 52557BF6.7060401@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 11:23 AM, Dimitri Fontaine wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> I don't see at all that your suggested alternative has any advantages over
>> what's been written. If you can say "NULL FOR (foo) as '""' how will you
>> specify the null for some other column(s)? Are we going to have multiple
>> such clauses? It looks like a real mess.
> Basically the CSV files don't have out-of-band NULLs and it's then a
> real mess. In the new pgloader version I've been adding per-column NULL
> processing, where NULL can be either an empty string, any number of
> space characters or any constant string such as "\N" or "****".
>
> I first added a global per-file NULL representation setting, but that's
> not flexible enough to make any sense really. The files we have to
> import are way to "creative" in their formats.
>
> In my view, we can slowly deprecate pgloader by including such features
> in the core code or make pgloader and the like non-optional parts of
> external data loading tool chain.
>

The CSV code was somewhat controversial when adopted, and was never
intended to cater for all cases. I think it was accepted because it gave
good coverage of a large number of common cases without huge additional
code complexity. I think we drew the line in about the right place for
what we support, although we've extended it modestly over the years. I
seriously doubt that it will ever fully replace a utility like pgloader,
and I'm not sure that's a desirable goal in the first place.

cheers

andrew


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 17:25:56
Message-ID: 20131009172556.GB13993@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 09, 2013 at 09:52:29AM -0400, Andrew Dunstan wrote:
>
> On 10/09/2013 09:22 AM, Amit Kapila wrote:
> >On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >>>On 10/07/2013 11:34 PM, Amit Kapila wrote:
> >>>>On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
> >>>>wrote:
> >>>>>On 10/07/2013 03:06 PM, Robert Haas wrote:
> >>>>>>
> >>>>>>>Also if your use case is to treat empty strings as NULL (as per above
> >>>>>>>documentation), can't it be handled with "WITH NULL AS" option.
> >>>>>>>For example, something like:
> >>>>>>>
> >>>>>>>postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
> >>>>>>>Enter data to be copied followed by a newline.
> >>>>>>>End with a backslash and a period on a line by itself.
> >>>>>>>>>50,
> >>>>>>>>>\.
> >>>>>>>postgres=# select * from testnull;
> >>>>>>> a | b
> >>>>>>>----+------
> >>>>>>> 50 | NULL
> >>>>>>>(1 row)
> >>>>>>Good point. If this patch is just implementing something that can
> >>>>>>already be done with another syntax, we don't need it.
> >>>>>>
> >>>>>Isn't the point of this option to allow a *quoted* empty string to be
> >>>>>forced
> >>>>>to NULL? If so, this is not testing the same case - in fact the COPY
> >>>>>command
> >>>>>above just makes explicit the default CSV NULL setting anyway.
> >>>>I am really not sure if all the purpose of patch can be achieved by
> >>>>existing syntax, neither it is explained clearly.
> >>>>However the proposal hasn't discussed why it's not good idea to extend
> >>>>some similar syntax "COPY .. NULL" which is used to replace string
> >>>>with NULL's?
> >>>>Description of NULL says: "Specifies the string that represents a null
> >>>>value."
> >>>>Now why can't this syntax be extended to support quoted empty string
> >>>>if it's not supported currently?
> >>>>I have not checked completely, If it's difficult or not possible to
> >>>>support in existing syntax, then even it add's more value to introduce
> >>>>new syntax.
> >>>>
> >>>>By asking above question, I doesn't mean that we should not go for the
> >>>>new proposed syntax, rather it's to know and understand the benefit of
> >>>>new syntax, also it helps during CF review for reviewer's if the
> >>>>proposal involves new syntax and that's discussed previously.
> >>>>
> >>>Quite apart from any other consideration, this suggestion is inferior to
> >>>what's proposed in that it's an all or nothing deal, while the patch allows
> >>>you to specify the behaviour very explicitly on a per column basis. I can
> >>>well imagine wanting to be able to force a quoted empty string to null for
> >>>numeric fields but not for text.
> > Okay, but can't it be done by extending current syntax such as
> > NULL FOR <col1, col2> AS ""- which would mean it will replace
> >corresponding column's values as NULL if they contain empty string.
> >
> >>>The basic principal of our CSV processing is that we don't ever turn a NULL
> >>>into something quoted and we don't ever turn something quoted into NULL.
> >>>That's what lets us round-trip test just about every combination of options.
> >>>I'm only going to be happy violating that, as this patch does, in a very
> >>>explicit and controlled way.
> >>Will this option allow only quoted empty string to be NULL or will
> >>handle without quoted empty string as well?
> > I had checked patch and it seems for both quoted and unquoted empty
> >string, it will replace it with NULL.
> > Now here it's bit different from FORCE_NOT_NULL, because already
> >for un-quoted empty strings we have a way to replace them with NULL.
> >
> > I think having 2 different syntax for replacing empty strings (one
> >for quoted and another for un-quoted) as NULL's might not be best way
> >to accomplish
> > this feature.
> >
> >
>
>
> I really don't know what you're saying here.
>
> Here is the situation we have today (assuming the default null
> marker of empty-string):
>
> default: empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null: empty-string -> emptystring,
> quoted-empty-string -> emptystring
>
> and the proposal would add to that:
>
> with force-null: empty-string -> null, quoted-empty-string -> null
>
> So it appears to be quite on all fours with the way force_not_null
> works now, it just does the reverse.
>
> I don't see at all that your suggested alternative has any
> advantages over what's been written. If you can say "NULL FOR (foo)
> as '""' how will you specify the null for some other column(s)?

Idea:

NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,

> Are we going to have multiple such clauses? It looks like a real
> mess.

Is that not part of what parsers ordinarily do?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 17:59:36
Message-ID: 52559988.5000606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 01:25 PM, David Fetter wrote:
>
> Idea:
>
> NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,
>
>

What's the point of this? How is this superior to what is currently
proposed? Having arbitrary NULL markers for different fields will
significantly increase code complexity for a case I have a hard time
believing exists to any significant degree in the real world. The ONLY
case I know of outside some fervid imaginations where we need to
distinguish between different NULL treatment is quoted vs unquoted,
which this patch rounds out. Catering for anything else seems quite
unnecessary.

What is more, I think it's actually far more obscure than what is
proposed, which fits in nicely with our existing pattern of options.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 19:25:42
Message-ID: CA+Tgmob8-m0DdNBxJhDxxw+4SF+URbu=sXfoHKqnSBp7xBBVpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 9, 2013 at 9:52 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> I really don't know what you're saying here.
>
> Here is the situation we have today (assuming the default null marker of
> empty-string):
>
> default: empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null: empty-string -> emptystring, quoted-empty-string ->
> emptystring
>
> and the proposal would add to that:
>
> with force-null: empty-string -> null, quoted-empty-string -> null
>
> So it appears to be quite on all fours with the way force_not_null works
> now, it just does the reverse.

That principle seems sound as far as it goes, but somehow I feel we've
dug ourselves into a hole here naming-wise. Apparently an unquoted
empty string is normally null, but you can use force-not-null to make
it an empty string instead. And a quoted empty string is normally an
empty string, but you can use force-null to make it a null instead.
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
An unsuspecting user confronted with a column marked with both of
those options at the same time might be a bit perplexed.

It seems to me that it might be better to have an option called
empty_input, and then you could have four values with names we can
bikeshed about - e.g. auto (the current default behavior), null
(proposed force null), empty_string (current force not null), reversed
(force null + force not null).

(In the interest of full disclosure, there is an EDB product that as
of recently offers something very much like this, with slightly
different naming and offering only the first three of those four
options, motivated by a customer complaint about the default behavior,
which was the same as COPY's default behavior. I don't know that the
solution we adopted there has any bearing on what ought to be done
here, and I can certainly live with it if people prefer to have FORCE
NULL and FORCE NOT NULL with not-quite-opposite meanings, but I do
think it's sort of hilariously awful, right up there with constraint
exclusion vs. exclusion constraints.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 21:43:14
Message-ID: 5255CDF2.5050007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 03:25 PM, Robert Haas wrote:
> Therefore, a user who wants the opposite of the default behavior -
> namely, unquoted empty strings as empty strings and quoted empty
> strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.

Is there a real world example of this case? How common is it? And how
come I haven't heard of it in the nine or so years since we've been
supporting CSV import?

Frankly it strikes me as more than a little perverse to have a CSV file
where you want an unquoted empty string to map to emptystring but a
quoted one to map to null, especially if you want that for the same
field. The main two problems I have heard people people complain about are:

* some CSV producers quote everything, even NULL values. This is
particularly true where the producer doesn't treat NULLs the same
way we do. This patch will fix that, allowing you to get a null on a
column by column basis as desired.
* some tables have NOT NULL constraints that are violated by unquoted
empty strings being interpreted as NULL - that was fixed long ago by
FORCE NOT NULL.

It's faintly possible you might encounter both of these problems in some
form in the one file, but not very likely.

Regarding syntax suggestions - this is not a green field. If we were
designing the syntax of COPY from scratch today we might make other
decisions than those that were made back in 2004. But I don't think we
can add new options using a quite different style from what's already
been done. That would look more than odd.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-10 00:39:03
Message-ID: CA+TgmoY1-K1h5YT8S5nP=m27RxcZ6AD1bXgRQBcn33HJRrmA6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 9, 2013 at 5:43 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 10/09/2013 03:25 PM, Robert Haas wrote:
>> Therefore, a user who wants the opposite of the default behavior -
>> namely, unquoted empty strings as empty strings and quoted empty
>> strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
>
> Is there a real world example of this case? How common is it? And how come I
> haven't heard of it in the nine or so years since we've been supporting CSV
> import?

I doubt there is any real world use case. My point is just that FORCE
NULL and FORCE NOT NULL are not opposites, and that will probably
confuse some people.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-10 00:54:54
Message-ID: 5255FADE.5080407@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 08:39 PM, Robert Haas wrote:
> On Wed, Oct 9, 2013 at 5:43 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 10/09/2013 03:25 PM, Robert Haas wrote:
>>> Therefore, a user who wants the opposite of the default behavior -
>>> namely, unquoted empty strings as empty strings and quoted empty
>>> strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
>> Is there a real world example of this case? How common is it? And how come I
>> haven't heard of it in the nine or so years since we've been supporting CSV
>> import?
> I doubt there is any real world use case. My point is just that FORCE
> NULL and FORCE NOT NULL are not opposites, and that will probably
> confuse some people.

I'm not wedded to the exact syntax. If you think this will lead to
confusion we could call it QUOTED NULL or some such. Doing it cleanly
without adding a new key word could stretch the imagination some.

cheers

andrew


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-10 03:47:04
Message-ID: CAA4eK1+-+A09acLR7WOyQ_XPdOtkPu1YU5pvRaDACCi4Kg6QOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 9, 2013 at 7:22 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 10/09/2013 09:22 AM, Amit Kapila wrote:
>>
>> On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>>>
>>> On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>>
>>>> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>>>>
>>>>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>>> wrote:
>>>>>>
>>>>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Also if your use case is to treat empty strings as NULL (as per
>>>>>>>> above
>>>>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>>>>> For example, something like:
>>>>>>>>
>>>>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>>>>> Enter data to be copied followed by a newline.
>>>>>>>> End with a backslash and a period on a line by itself.
>>>>>>>>>>
>>>>>>>>>> 50,
>>>>>>>>>> \.
>>>>>>>>
>>>>>>>> postgres=# select * from testnull;
>>>>>>>> a | b
>>>>>>>> ----+------
>>>>>>>> 50 | NULL
>>>>>>>> (1 row)
>>>>>>>
>>>>>>> Good point. If this patch is just implementing something that can
>>>>>>> already be done with another syntax, we don't need it.
>>>>>>>
>>>>>> Isn't the point of this option to allow a *quoted* empty string to be
>>>>>> forced
>>>>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>>>>> command
>>>>>> above just makes explicit the default CSV NULL setting anyway.
>>>>>
>>>>> I am really not sure if all the purpose of patch can be achieved by
>>>>> existing syntax, neither it is explained clearly.
>>>>> However the proposal hasn't discussed why it's not good idea to extend
>>>>> some similar syntax "COPY .. NULL" which is used to replace string
>>>>> with NULL's?
>>>>> Description of NULL says: "Specifies the string that represents a null
>>>>> value."
>>>>> Now why can't this syntax be extended to support quoted empty string
>>>>> if it's not supported currently?
>>>>> I have not checked completely, If it's difficult or not possible to
>>>>> support in existing syntax, then even it add's more value to introduce
>>>>> new syntax.
>>>>>
>>>>> By asking above question, I doesn't mean that we should not go for the
>>>>> new proposed syntax, rather it's to know and understand the benefit of
>>>>> new syntax, also it helps during CF review for reviewer's if the
>>>>> proposal involves new syntax and that's discussed previously.
>>>>>
>>>> Quite apart from any other consideration, this suggestion is inferior to
>>>> what's proposed in that it's an all or nothing deal, while the patch
>>>> allows
>>>> you to specify the behaviour very explicitly on a per column basis. I
>>>> can
>>>> well imagine wanting to be able to force a quoted empty string to null
>>>> for
>>>> numeric fields but not for text.
>>
>> Okay, but can't it be done by extending current syntax such as
>> NULL FOR <col1, col2> AS ""- which would mean it will replace
>> corresponding column's values as NULL if they contain empty string.
>>
>>>> The basic principal of our CSV processing is that we don't ever turn a
>>>> NULL
>>>> into something quoted and we don't ever turn something quoted into NULL.
>>>> That's what lets us round-trip test just about every combination of
>>>> options.
>>>> I'm only going to be happy violating that, as this patch does, in a very
>>>> explicit and controlled way.
>>>
>>> Will this option allow only quoted empty string to be NULL or will
>>> handle without quoted empty string as well?
>>
>> I had checked patch and it seems for both quoted and unquoted empty
>> string, it will replace it with NULL.
>> Now here it's bit different from FORCE_NOT_NULL, because already
>> for un-quoted empty strings we have a way to replace them with NULL.
>>
>> I think having 2 different syntax for replacing empty strings (one
>> for quoted and another for un-quoted) as NULL's might not be best way
>> to accomplish
>> this feature.
>>
>>
>
>
> I really don't know what you're saying here.
>
> Here is the situation we have today (assuming the default null marker of
> empty-string):
>
> default: empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null: empty-string -> emptystring, quoted-empty-string ->
> emptystring
>
> and the proposal would add to that:
>
> with force-null: empty-string -> null, quoted-empty-string -> null
>
> So it appears to be quite on all fours with the way force_not_null works
> now, it just does the reverse.
>
> I don't see at all that your suggested alternative has any advantages over
> what's been written. If you can say "NULL FOR (foo) as '""' how will you
> specify the null for some other column(s)? Are we going to have multiple
> such clauses?

No, if user wants for particular string (ex. quoted empty string)
that few of it's columns becomes NULL, he can specify them together (
NULL for (col1, col2)).

> It looks like a real mess.

One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax.

"FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-10 13:15:28
Message-ID: 5256A870.5000303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/09/2013 11:47 PM, Amit Kapila wrote:
>
> One of the advantage, I could see using "NULL For .." syntax is
> that already we have one syntax with which user can specify what
> strings can be replaced with NULL, now just to handle quoted empty
> string why to add different syntax.
>
> "FORCE_NULL" has advantage that it can be used for some columns rather
> than all columns, but I think for that existing syntax can also be
> modified to support it.
>
>
>

I think it's badly designed on its face. We don't need and shouldn't
provide a facility for different NULL markers. A general facility for
that would be an ugly an quite pointless addition to code complexity.
What we need is simply a way of altering one specific behaviour, namely
the treatment of quoted NULL markers. We should not do that by allowing
munging the NULL marker per column, but by a syntactical mechanism that
directly addresses the change in behaviour. If you don't like "FORCE
NULL" then let's pick something else, but not this ugly and unnecessary
"NULL FOR" gadget.

cheers

andrew


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-11 04:49:38
Message-ID: CAA4eK1+t6H3LZ18ofuVMbFhsyApp_LynnGm18fFc4jNyxrSXjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 10/09/2013 11:47 PM, Amit Kapila wrote:
>>
>>
>> One of the advantage, I could see using "NULL For .." syntax is
>> that already we have one syntax with which user can specify what
>> strings can be replaced with NULL, now just to handle quoted empty
>> string why to add different syntax.
>>
>> "FORCE_NULL" has advantage that it can be used for some columns rather
>> than all columns, but I think for that existing syntax can also be
>> modified to support it.
>>
>>
>>
>
>
> I think it's badly designed on its face. We don't need and shouldn't
> provide a facility for different NULL markers. A general facility for that
> would be an ugly an quite pointless addition to code complexity. What we
> need is simply a way of altering one specific behaviour, namely the
> treatment of quoted NULL markers. We should not do that by allowing munging
> the NULL marker per column,

I was thinking it to similar in some sense with "insert into tbl"
statement. For example
Create table tbl (c1 int, c2 int, c3 int, c4 int)
insert into tbl (col2) values(1);

Here after table name, user can specify column names for which he
wants to provide specific values.

> but by a syntactical mechanism that directly
> addresses the change in behaviour. If you don't like "FORCE NULL" then let's
> pick something else, but not this ugly and unnecessary "NULL FOR" gadget.

If you don't like idea of one generic syntax for NULLs in COPY
command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more
sense as compare to FORCE_NULL.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-11 13:11:17
Message-ID: CA+Tgmoauzeu5dwJ_PTmMgU_3-gJ3UoDB0HS3keL0T7MgZ7Yxvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 11, 2013 at 12:49 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>> On 10/09/2013 11:47 PM, Amit Kapila wrote:
>>>
>>>
>>> One of the advantage, I could see using "NULL For .." syntax is
>>> that already we have one syntax with which user can specify what
>>> strings can be replaced with NULL, now just to handle quoted empty
>>> string why to add different syntax.
>>>
>>> "FORCE_NULL" has advantage that it can be used for some columns rather
>>> than all columns, but I think for that existing syntax can also be
>>> modified to support it.
>>>
>>>
>>>
>>
>>
>> I think it's badly designed on its face. We don't need and shouldn't
>> provide a facility for different NULL markers. A general facility for that
>> would be an ugly an quite pointless addition to code complexity. What we
>> need is simply a way of altering one specific behaviour, namely the
>> treatment of quoted NULL markers. We should not do that by allowing munging
>> the NULL marker per column,
>
> I was thinking it to similar in some sense with "insert into tbl"
> statement. For example
> Create table tbl (c1 int, c2 int, c3 int, c4 int)
> insert into tbl (col2) values(1);
>
> Here after table name, user can specify column names for which he
> wants to provide specific values.
>
>> but by a syntactical mechanism that directly
>> addresses the change in behaviour. If you don't like "FORCE NULL" then let's
>> pick something else, but not this ugly and unnecessary "NULL FOR" gadget.
>
> If you don't like idea of one generic syntax for NULLs in COPY
> command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more
> sense as compare to FORCE_NULL.

Meh. As amused as I am with our bad naming, I don't think there's
anything too terribly wrong with FORCE NULL. Symmetry has some value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company