Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode

Lists: pgsql-hackers
From: Payal Singh <payal(at)omniti(dot)com>
To: barwick(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-31 19:49:06
Message-ID: CANUg7LDW0WDrDpMhVHce9EDrOpR4O2_op6ySU3k0JoSdxrmHFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The post was made before I subscribed to the mailing list, so posting my
review separately. The link to the original patch mail is
http://www.postgresql.org/message-id/CAB8KJ=jS-Um4TGwenS5wLUfJK6K4rNOm_V6GRUj+tcKekL2=GQ@mail.gmail.com

> Hi,
>
> This patch implements the following TODO item:
>
> Allow COPY in CSV mode to control whether a quoted zero-length
> string is treated as NULL
>
> Currently this is always treated as a zero-length string,
> which generates an error when loading into an integer column
>
> Re: [PATCHES] allow CSV quote in NULL
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
>
>
> http://wiki.postgresql.org/wiki/Todo#COPY
>
>
> I had a very definite use-case for this functionality recently while importing
> CSV files generated by Oracle, and was somewhat frustrated by the existence
> of a FORCE_NOT_NULL option for specific columns, but not one for
> FORCE_NULL.
>
> I'll add this to the November commitfest.
>
>
> Regards
>
> Ian Barwick
>
>
==================
Contents & Purpose
==================

This patch introduces a new 'FORCE_NULL' option which has the opposite
functionality of the already present 'FORCE_NOT_NULL' option for the COPY
command. Prior to this option there was no way to convert a zeroed-length
quoted value to a NULL value when COPY FROM is used to import data from CSV
formatted files.

==================
Submission Review
==================

The patch is in the correct context diff format. It includes changes to the
documentation as well as additional regression tests. The description has
been discussed and defined in the previous mails leading to this patch.

=====================
Functionality Review
=====================

CORRECTION NEEDED - Due to a minor error in code (details in 'Code Review'
section below), force_null option is not limited to COPY FROM, and works
even when COPY TO is used. This should instead give an error message.

The updated documentation describes the added functionality clearly.

All regression tests passed successfully.

Code compilation after including patch was successful. No warnings either.

Manually tested COPY FROM with FORCE_NULL for a number of scenarios, all
with expected outputs. No issues.

Been testing the patch for a few days, no crashes or weird behavior
observed.

=============================================
Code Formatting Review (Needs Improvement)
=============================================

Looks good, the tabs between variable declaration and accompanying comment
can be improved.

=================================
Code Review (Needs Improvement)
=================================

1. There is a " NOTE: force_not_null option are not applied to the returned
fields." before COPY FROM block. Similar note should be added for
force_null option too.

2. One of the conditions to check and give an error if force_null is true
and copy from is false is wrong, cstate->force_null should be checked
instead of cstate->force_notnull:

/* Check force_notnull */
if (!cstate->csv_mode && cstate->force_notnull != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force not null available only
in CSV mode")));
if (cstate->force_notnull != NIL && !is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force not null only available using
COPY FROM")));

/* Check force_null */
if (!cstate->csv_mode && cstate->force_null != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force null available only in
CSV mode")));

==> if (cstate->force_notnull != NIL && !is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force null only available using COPY
FROM")));

===============================
Suggested Changes & Conclusion
===============================

The Above mentioned error condition should be corrected. Minor comments and
tab changes are upto the author.

In all, suggested modifications aside, the patch works well and in my
opinion, would be a useful addition to the COPY command.

Payal Singh,
OmniTi Computer Consulting Inc.
Junior Database Architect


From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Payal Singh <payal(at)omniti(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-01-28 10:55:13
Message-ID: CAB8KJ=jVfVi3U8DTn_K05spir6jJoSGFnWVVvvrzL7tK9dOUeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013-11-01 Payal Singh <payal(at)omniti(dot)com>:
> The post was made before I subscribed to the mailing list, so posting my
> review separately. The link to the original patch mail is
> http://www.postgresql.org/message-id/CAB8KJ=jS-Um4TGwenS5wLUfJK6K4rNOm_V6GRUj+tcKekL2=GQ@mail.gmail.com
>
>>
>> Hi,
>>
>> This patch implements the following TODO item:
>>
>> Allow COPY in CSV mode to control whether a quoted zero-length
>> string is treated as NULL
>>
>> Currently this is always treated as a zero-length string,
>> which generates an error when loading into an integer column
>>
>> Re: [PATCHES] allow CSV quote in NULL
>> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
>>
>>
>> http://wiki.postgresql.org/wiki/Todo#COPY
>>
>>
>> I had a very definite use-case for this functionality recently while
>> importing
>> CSV files generated by Oracle, and was somewhat frustrated by the
>> existence
>> of a FORCE_NOT_NULL option for specific columns, but not one for
>> FORCE_NULL.
>>
>> I'll add this to the November commitfest.
>>
>>
>> Regards
>>
>> Ian Barwick
>
>
> ==================
> Contents & Purpose
> ==================
>
> This patch introduces a new 'FORCE_NULL' option which has the opposite
> functionality of the already present 'FORCE_NOT_NULL' option for the COPY
> command. Prior to this option there was no way to convert a zeroed-length
> quoted value to a NULL value when COPY FROM is used to import data from CSV
> formatted files.
>
> ==================
> Submission Review
> ==================
>
> The patch is in the correct context diff format. It includes changes to the
> documentation as well as additional regression tests. The description has
> been discussed and defined in the previous mails leading to this patch.
>
> =====================
> Functionality Review
> =====================
>
> CORRECTION NEEDED - Due to a minor error in code (details in 'Code Review'
> section below), force_null option is not limited to COPY FROM, and works
> even when COPY TO is used. This should instead give an error message.
>
> The updated documentation describes the added functionality clearly.
>
> All regression tests passed successfully.
>
> Code compilation after including patch was successful. No warnings either.
>
> Manually tested COPY FROM with FORCE_NULL for a number of scenarios, all
> with expected outputs. No issues.
>
> Been testing the patch for a few days, no crashes or weird behavior
> observed.
>
> =============================================
> Code Formatting Review (Needs Improvement)
> =============================================
>
> Looks good, the tabs between variable declaration and accompanying comment
> can be improved.
>
> =================================
> Code Review (Needs Improvement)
> =================================
>
> 1. There is a " NOTE: force_not_null option are not applied to the returned
> fields." before COPY FROM block. Similar note should be added for force_null
> option too.
>
> 2. One of the conditions to check and give an error if force_null is true
> and copy from is false is wrong, cstate->force_null should be checked
> instead of cstate->force_notnull:
>
> /* Check force_notnull */
> if (!cstate->csv_mode && cstate->force_notnull != NIL)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("COPY force not null available only
> in CSV mode")));
> if (cstate->force_notnull != NIL && !is_from)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("COPY force not null only available using
> COPY FROM")));
>
> /* Check force_null */
> if (!cstate->csv_mode && cstate->force_null != NIL)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("COPY force null available only in
> CSV mode")));
>
> ==> if (cstate->force_notnull != NIL && !is_from)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("COPY force null only available using COPY
> FROM")));
>
> ===============================
> Suggested Changes & Conclusion
> ===============================
>
> The Above mentioned error condition should be corrected. Minor comments and
> tab changes are upto the author.
>
> In all, suggested modifications aside, the patch works well and in my
> opinion, would be a useful addition to the COPY command.

Hi Payal

Many thanks for the review, and my apologies for not getting back to
you earlier.

Updated version of the patch attached with suggested corrections. I'm not
sure about the tabs in the variable declarations - the whole section
seems to be all over the place (regardless of whether tabs are set to 4 or
8 spaces) and could do with tidying up, however I didn't want to expand the
scope of the patch too much.

Quick recap of the reasons behind this patch - we had a bunch of CSV files
(and by "bunch" I mean totalling hundreds of millions of lines) exported
from a data source which was unable to distinguish between an empty string
and a null value. Too late we realized we had a ridiculous number of
comma separated rows with some empty strings which should be kept as such,
and some empty strings which should be imported as NULLs. "Wait", I said,
for I remembered COPY FROM had some kind of option involving specifying
the NULL status of certain columns. Alas it turned out to be the opposite
of what we required, and we found another workaround, but I thought as
it's likely we would face a similar situation in the future, it would
be handy to have the option available.

Example:

Given a table like this (a very stripped-down version of the original use
case):

CREATE TABLE nulltest (
prefix TEXT NOT NULL DEFAULT '',
altname TEXT DEFAULT NULL
);

I want to be able to do this:

postgres=# COPY nulltest FROM STDIN WITH (FORMAT CSV, FORCE_NULL (altname));
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> "",""
>> \.
postgres=# \pset null NULL
Null display (null) is "NULL".
postgres=# SELECT * FROM nulltest ;
prefix | altname
--------+---------
| NULL
(1 row)

I don't have any strong feelings about the syntax - as is it's just
the logical opposite of what's already available.

Regards

Ian Barwick

Attachment Content-Type Size
copy-force-null-v2.patch application/octet-stream 9.9 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-01-28 15:12:25
Message-ID: 52E7C8D9.1050207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>
> Hi Payal
>
> Many thanks for the review, and my apologies for not getting back to
> you earlier.
>
> Updated version of the patch attached with suggested corrections.

On a very quick glance, I see that you have still not made adjustments
to contrib/file_fdw to accommodate this new option. I don't see why this
COPY option should be different in that respect.

cheers

andrew


From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-01-28 15:28:03
Message-ID: CAB8KJ=gKW-mncg=mFe3h2mR7ji4kMBdJcCqx39UWQ02FnPMATA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-01-29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>>
>>
>> Hi Payal
>>
>> Many thanks for the review, and my apologies for not getting back to
>> you earlier.
>>
>> Updated version of the patch attached with suggested corrections.
>
> On a very quick glance, I see that you have still not made adjustments to
> contrib/file_fdw to accommodate this new option. I don't see why this COPY
> option should be different in that respect.

Hmm, that idea seems to have escaped me completely. I'll get onto it forthwith.

Regards

Ian Barwick


From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-01-29 15:59:41
Message-ID: CAB8KJ=gKJm1xx6SVZrRcrjNGm3Do9CMC81z49ny52L5FGzSm6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014/1/29 Ian Lawrence Barwick <barwick(at)gmail(dot)com>:
> 2014-01-29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>
>> On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>>>
>>>
>>> Hi Payal
>>>
>>> Many thanks for the review, and my apologies for not getting back to
>>> you earlier.
>>>
>>> Updated version of the patch attached with suggested corrections.
>>
>> On a very quick glance, I see that you have still not made adjustments to
>> contrib/file_fdw to accommodate this new option. I don't see why this COPY
>> option should be different in that respect.
>
> Hmm, that idea seems to have escaped me completely. I'll get onto it forthwith.

Striking while the keyboard is hot... version with contrib/file_fdw
modifications
attached.

Regards

Ian Barwick

Attachment Content-Type Size
copy-force-null-v3.patch text/x-patch 24.4 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-01 23:26:03
Message-ID: 53126C8B.30309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/29/2014 10:59 AM, Ian Lawrence Barwick wrote:
> 2014/1/29 Ian Lawrence Barwick <barwick(at)gmail(dot)com>:
>> 2014-01-29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>> On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>>>>
>>>> Hi Payal
>>>>
>>>> Many thanks for the review, and my apologies for not getting back to
>>>> you earlier.
>>>>
>>>> Updated version of the patch attached with suggested corrections.
>>> On a very quick glance, I see that you have still not made adjustments to
>>> contrib/file_fdw to accommodate this new option. I don't see why this COPY
>>> option should be different in that respect.
>> Hmm, that idea seems to have escaped me completely. I'll get onto it forthwith.
> Striking while the keyboard is hot... version with contrib/file_fdw
> modifications
> attached.
>
>

I have reviewed this. Generally it's good, but the author has made a
significant error - the idea is not to force a quoted empty string to
null, but to force a quoted null string to null, whatever the null
string might be. The default case has these the same, but if you specify
a non-empty null string they aren't.

That difference actually made the file_fdw regression results plain
wrong, in my view, in that they expected a quoted empty string to be
turned to null even when the null string was something else.

I've adjusted this and the docs and propose to apply the attached patch
in the next day or two unless there are any objections.

cheers

andrew

Attachment Content-Type Size
copy-force-null-v4.patch text/x-patch 21.1 KB

From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-03 03:06:19
Message-ID: CAB8KJ=gGG6pZ=4i4QMLLbQ1Qy_zQrmA3sfhT+wXjHN0EWf-tNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-03-02 8:26 GMT+09:00 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 01/29/2014 10:59 AM, Ian Lawrence Barwick wrote:
>>
>> 2014/1/29 Ian Lawrence Barwick <barwick(at)gmail(dot)com>:
>>>
>>> 2014-01-29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>>>
>>>> On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>>>>>
>>>>>
>>>>> Hi Payal
>>>>>
>>>>> Many thanks for the review, and my apologies for not getting back to
>>>>> you earlier.
>>>>>
>>>>> Updated version of the patch attached with suggested corrections.
>>>>
>>>> On a very quick glance, I see that you have still not made adjustments
>>>> to
>>>> contrib/file_fdw to accommodate this new option. I don't see why this
>>>> COPY
>>>> option should be different in that respect.
>>>
>>> Hmm, that idea seems to have escaped me completely. I'll get onto it
>>> forthwith.
>>
>> Striking while the keyboard is hot... version with contrib/file_fdw
>> modifications
>> attached.
>>
>>

> I have reviewed this. Generally it's good, but the author has made a
> significant error - the idea is not to force a quoted empty string to null,
> but to force a quoted null string to null, whatever the null string might
> be. The default case has these the same, but if you specify a non-empty null
> string they aren't.

The author slaps himself on the forehead while regretting he was temporally
constricted when dealing with the patch and never thought to look beyond
the immediate use case.

Thanks for the update, much appreciated.

> That difference actually made the file_fdw regression results plain wrong,
> in my view, in that they expected a quoted empty string to be turned to null
> even when the null string was something else.
>
> I've adjusted this and the docs and propose to apply the attached patch in
> the next day or two unless there are any objections.

Unless I'm overlooking something, output from "SELECT * FROM text_csv;"
in 'output/file_fdw.source' still needs updating?

Regards

Ian Barwick

Attachment Content-Type Size
copy-force-null-v5.patch text/x-patch 25.2 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-03 04:13:19
Message-ID: 5314015F.4020509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 03/02/2014 10:06 PM, Ian Lawrence Barwick wrote:
> 2014-03-02 8:26 GMT+09:00 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>> On 01/29/2014 10:59 AM, Ian Lawrence Barwick wrote:
>>> 2014/1/29 Ian Lawrence Barwick <barwick(at)gmail(dot)com>:
>>>> 2014-01-29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>>>> On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:
>>>>>>
>>>>>> Hi Payal
>>>>>>
>>>>>> Many thanks for the review, and my apologies for not getting back to
>>>>>> you earlier.
>>>>>>
>>>>>> Updated version of the patch attached with suggested corrections.
>>>>> On a very quick glance, I see that you have still not made adjustments
>>>>> to
>>>>> contrib/file_fdw to accommodate this new option. I don't see why this
>>>>> COPY
>>>>> option should be different in that respect.
>>>> Hmm, that idea seems to have escaped me completely. I'll get onto it
>>>> forthwith.
>>> Striking while the keyboard is hot... version with contrib/file_fdw
>>> modifications
>>> attached.
>>>
>>>
>> I have reviewed this. Generally it's good, but the author has made a
>> significant error - the idea is not to force a quoted empty string to null,
>> but to force a quoted null string to null, whatever the null string might
>> be. The default case has these the same, but if you specify a non-empty null
>> string they aren't.
> The author slaps himself on the forehead while regretting he was temporally
> constricted when dealing with the patch and never thought to look beyond
> the immediate use case.
>
> Thanks for the update, much appreciated.
>
>> That difference actually made the file_fdw regression results plain wrong,
>> in my view, in that they expected a quoted empty string to be turned to null
>> even when the null string was something else.
>>
>> I've adjusted this and the docs and propose to apply the attached patch in
>> the next day or two unless there are any objections.
> Unless I'm overlooking something, output from "SELECT * FROM text_csv;"
> in 'output/file_fdw.source' still needs updating?
>
>

Yes, you're right. Will fix.

cheers

andrew


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-03 11:48:53
Message-ID: CAB7nPqTSyJisq4ba2i-iErwVk+2RGe62YBqK5KnF9WkQHEfj-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 3, 2014 at 1:13 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> That difference actually made the file_fdw regression results plain
>>> wrong,
>>> in my view, in that they expected a quoted empty string to be turned to
>>> null
>>> even when the null string was something else.
>>>
>>> I've adjusted this and the docs and propose to apply the attached patch
>>> in
>>> the next day or two unless there are any objections.
>>
>> Unless I'm overlooking something, output from "SELECT * FROM text_csv;"
>> in 'output/file_fdw.source' still needs updating?
While reading this patch, I found a small typo in copy2.[sql|out] =>
s/violiaton/violation/g
Regards,
--
Michael


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-04 22:44:52
Message-ID: 53165764.6020004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 03/03/2014 06:48 AM, Michael Paquier wrote:
> On Mon, Mar 3, 2014 at 1:13 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>>> That difference actually made the file_fdw regression results plain
>>>> wrong,
>>>> in my view, in that they expected a quoted empty string to be turned to
>>>> null
>>>> even when the null string was something else.
>>>>
>>>> I've adjusted this and the docs and propose to apply the attached patch
>>>> in
>>>> the next day or two unless there are any objections.
>>> Unless I'm overlooking something, output from "SELECT * FROM text_csv;"
>>> in 'output/file_fdw.source' still needs updating?
> While reading this patch, I found a small typo in copy2.[sql|out] =>
> s/violiaton/violation/g

I have picked this up and committed the patch. Thanks to all.

cheers

andrew


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 12:49:30
Message-ID: CAB7nPqTjFZBS7kvAPsVVN4-CeRQurfXXXQgVrQo8RuvybndUNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 5, 2014 at 7:44 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> I have picked this up and committed the patch. Thanks to all.
Sorry for coming after the battle, but while looking at what has been
committed I noticed that copy2.sql is actually doing twice in a row
the same test:
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
FORCE_NOT_NULL(b), FORCE_NULL(c));
2,,""
\.

See? For both tests the quotes are placed on the same column, the 3rd.
I think that one of them should be like that, with the quotes on the
2nd column => 2,"",
The attached patch corrects that... and a misplaced comment.
Regards,
--
Michael

Attachment Content-Type Size
20140305_fix_copy_null_tests.patch text/plain 1.7 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 14:11:41
Message-ID: CAB7nPqRquusDi291GdjTjQq2Zv8aP8JGFP6NWM_84O7eU5bznA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

After testing this feature, I noticed that FORCE_NULL and
FORCE_NOT_NULL can both be specified with COPY on the same column.
This does not seem correct. The attached patch adds some more error
handling, and a regression test case for that.
Regards,
--
Michael

Attachment Content-Type Size
20140305_copy_force_error.patch text/plain 2.3 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 14:27:56
Message-ID: 5317346C.9000501@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 03/05/2014 09:11 AM, Michael Paquier wrote:
> After testing this feature, I noticed that FORCE_NULL and
> FORCE_NOT_NULL can both be specified with COPY on the same column.
> This does not seem correct. The attached patch adds some more error
> handling, and a regression test case for that.
>

Strictly they are not actually contradictory, since FORCE NULL relates
to quoted null strings and FORCE NOT NULL relates to unquoted null
strings. Arguably the docs are slightly loose on this point. Still,
applying both FORCE NULL and FORCE NOT NULL to the same column would be
rather perverse, since it would result in a quoted null string becoming
null and an unquoted null string becoming not null.

I'd be more inclined just to tighten the docs and maybe expand the
regression tests a bit, but I could be persuaded the other way if people
think it's worth it.

cheers

andrew


From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 14:37:18
Message-ID: CAB8KJ=iOTV3-VmYCGX0wFj2eEDXza7so-LvaKTJRFffd5vxwfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-03-05 23:27 GMT+09:00 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 03/05/2014 09:11 AM, Michael Paquier wrote:
>>
>> After testing this feature, I noticed that FORCE_NULL and
>> FORCE_NOT_NULL can both be specified with COPY on the same column.
>> This does not seem correct. The attached patch adds some more error
>> handling, and a regression test case for that.
>>
>
>
> Strictly they are not actually contradictory, since FORCE NULL relates to
> quoted null strings and FORCE NOT NULL relates to unquoted null strings.
> Arguably the docs are slightly loose on this point. Still, applying both
> FORCE NULL and FORCE NOT NULL to the same column would be rather perverse,
> since it would result in a quoted null string becoming null and an unquoted
> null string becoming not null.

Too frazzled to recall clearly right now, but I think that was the somewhat
counterintuitive conclusion I originally came to.

> I'd be more inclined just to tighten the docs and maybe expand the
> regression tests a bit, but I could be persuaded the other way if people
> think it's worth it.

Might be worth doing if it's an essentially useless feature, if only to
preempt an unending chain of bug reports.

Many thanks for everyone's input on this, and apologies for not giving
the patch enough rigorous attention.

Regards

Ian Barwick


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 14:58:47
Message-ID: CAB7nPqS=dbePrb-NiDfE+JZGB6WXLksrwDoFuRkHtjEn0tXYTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 5, 2014 at 11:37 PM, Ian Lawrence Barwick <barwick(at)gmail(dot)com> wrote:
> 2014-03-05 23:27 GMT+09:00 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>
>> On 03/05/2014 09:11 AM, Michael Paquier wrote:
>>>
>>> After testing this feature, I noticed that FORCE_NULL and
>>> FORCE_NOT_NULL can both be specified with COPY on the same column.
>>> This does not seem correct. The attached patch adds some more error
>>> handling, and a regression test case for that.
>>>
>>
>>
>> Strictly they are not actually contradictory, since FORCE NULL relates to
>> quoted null strings and FORCE NOT NULL relates to unquoted null strings.
>> Arguably the docs are slightly loose on this point. Still, applying both
>> FORCE NULL and FORCE NOT NULL to the same column would be rather perverse,
>> since it would result in a quoted null string becoming null and an unquoted
>> null string becoming not null.
>
> Too frazzled to recall clearly right now, but I think that was the somewhat
> counterintuitive conclusion I originally came to.
In this case I may be an intuitive guy :), but OK I see your point. So
if we specify both this produces the exact opposite as the default,
default being an empty string inserted for a quoted empty string and
NULL inserted for a non-quoted empty string. So yes I'm fine with a
note on the docs about that, and some more regression tests.

Btw, if we allow this behavior in COPY, why doesn't file_fdw allow
both options to be allowed on the same column for a foreign table?
Current behavior of file_fdw seems rather inconsistent with COPY as it
stands now.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 15:03:33
Message-ID: CAB7nPqQ5i3-ccXuPBu6dzP=TRDkTrun++fF4kgAYzvthrmjQyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 5, 2014 at 11:58 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> So if we specify both this produces the exact opposite of the default,
> default being an empty string inserted for a quoted empty string and
> NULL inserted for a non-quoted empty string. So yes I'm fine with a
> note on the docs about that, and some more regression tests.

For people who did not get this one, here is a short example:
=# ¥pset null 'null'
Null display (null) is "null".
=# create table aa (a text);
CREATE TABLE
=# COPY aa FROM STDIN WITH (FORMAT csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ""
>>
>> \.
=# select * from aa;
a
------

null
(2 rows)
=# truncate aa;
TRUNCATE TABLE
Time: 12.149 ms
=# COPY aa FROM STDIN
WITH (FORMAT csv, FORCE_NULL(a), FORCE_NOT_NULL(a));
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ""
>>
>> \.
Time: 3776.662 ms
=# select * from aa;
a
------
null

(2 rows)
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-05 15:09:48
Message-ID: 3732.1394032188@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 03/05/2014 09:11 AM, Michael Paquier wrote:
>> After testing this feature, I noticed that FORCE_NULL and
>> FORCE_NOT_NULL can both be specified with COPY on the same column.

> Strictly they are not actually contradictory, since FORCE NULL relates
> to quoted null strings and FORCE NOT NULL relates to unquoted null
> strings. Arguably the docs are slightly loose on this point. Still,
> applying both FORCE NULL and FORCE NOT NULL to the same column would be
> rather perverse, since it would result in a quoted null string becoming
> null and an unquoted null string becoming not null.

Given the remarkable lack of standardization of "CSV" output, who's
to say that there might not be data sources out there for which this
is the desired behavior? It's weird, I agree, but I think throwing
an error for the combination is not going to be helpful. It's not
like somebody might accidentally write both on the same column.

+1 for clarifying the docs, though, more or less in the words you
used above.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-03-07 08:08:54
Message-ID: CAB7nPqR_2qL_sQnUG41y56CSkc5VrhNn7RAceXkR32n+o7vCVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 03/05/2014 09:11 AM, Michael Paquier wrote:
>>> After testing this feature, I noticed that FORCE_NULL and
>>> FORCE_NOT_NULL can both be specified with COPY on the same column.
>
>> Strictly they are not actually contradictory, since FORCE NULL relates
>> to quoted null strings and FORCE NOT NULL relates to unquoted null
>> strings. Arguably the docs are slightly loose on this point. Still,
>> applying both FORCE NULL and FORCE NOT NULL to the same column would be
>> rather perverse, since it would result in a quoted null string becoming
>> null and an unquoted null string becoming not null.
>
> Given the remarkable lack of standardization of "CSV" output, who's
> to say that there might not be data sources out there for which this
> is the desired behavior? It's weird, I agree, but I think throwing
> an error for the combination is not going to be helpful. It's not
> like somebody might accidentally write both on the same column.
>
> +1 for clarifying the docs, though, more or less in the words you
> used above.
Following that, I have hacked the patch attached to update the docs
with an additional regression test (actually replaces a test that was
the same as the one before in copy2).

I am attaching as well a second patch for file_fdw, to allow the use
of force_null and force_not_null on the same column, to be consistent
with COPY.
Regards,
--
Michael

Attachment Content-Type Size
20140307_file_fdw_loose_null.patch text/x-diff 5.1 KB
20140307_force_null_docs.patch text/x-diff 3.2 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-04-22 20:06:33
Message-ID: 20140422200633.GD10046@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 5, 2014 at 09:49:30PM +0900, Michael Paquier wrote:
> On Wed, Mar 5, 2014 at 7:44 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > I have picked this up and committed the patch. Thanks to all.
> Sorry for coming after the battle, but while looking at what has been
> committed I noticed that copy2.sql is actually doing twice in a row
> the same test:
> COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
> FORCE_NOT_NULL(b), FORCE_NULL(c));
> 1,,""
> \.
> -- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
> COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
> FORCE_NOT_NULL(b), FORCE_NULL(c));
> 2,,""
> \.
>
> See? For both tests the quotes are placed on the same column, the 3rd.
> I think that one of them should be like that, with the quotes on the
> 2nd column => 2,"",
> The attached patch corrects that... and a misplaced comment.
> Regards,

Thanks. Patch applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-04-22 20:07:08
Message-ID: 20140422200708.GE10046@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 7, 2014 at 05:08:54PM +0900, Michael Paquier wrote:
> On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >> On 03/05/2014 09:11 AM, Michael Paquier wrote:
> >>> After testing this feature, I noticed that FORCE_NULL and
> >>> FORCE_NOT_NULL can both be specified with COPY on the same column.
> >
> >> Strictly they are not actually contradictory, since FORCE NULL relates
> >> to quoted null strings and FORCE NOT NULL relates to unquoted null
> >> strings. Arguably the docs are slightly loose on this point. Still,
> >> applying both FORCE NULL and FORCE NOT NULL to the same column would be
> >> rather perverse, since it would result in a quoted null string becoming
> >> null and an unquoted null string becoming not null.
> >
> > Given the remarkable lack of standardization of "CSV" output, who's
> > to say that there might not be data sources out there for which this
> > is the desired behavior? It's weird, I agree, but I think throwing
> > an error for the combination is not going to be helpful. It's not
> > like somebody might accidentally write both on the same column.
> >
> > +1 for clarifying the docs, though, more or less in the words you
> > used above.
> Following that, I have hacked the patch attached to update the docs
> with an additional regression test (actually replaces a test that was
> the same as the one before in copy2).
>
> I am attaching as well a second patch for file_fdw, to allow the use
> of force_null and force_not_null on the same column, to be consistent
> with COPY.
> Regards,

Correction, this is the patch applied, not the earlier version.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Payal Singh <payal(at)omniti(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Date: 2014-04-22 21:53:54
Message-ID: CAB7nPqSZdpDV0-w_3qrTWLHdFHnLb7ZsJc3okbnb4818ji_VnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 23, 2014 at 5:07 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Fri, Mar 7, 2014 at 05:08:54PM +0900, Michael Paquier wrote:
> > On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > >> On 03/05/2014 09:11 AM, Michael Paquier wrote:
> > >>> After testing this feature, I noticed that FORCE_NULL and
> > >>> FORCE_NOT_NULL can both be specified with COPY on the same column.
> > >
> > >> Strictly they are not actually contradictory, since FORCE NULL relates
> > >> to quoted null strings and FORCE NOT NULL relates to unquoted null
> > >> strings. Arguably the docs are slightly loose on this point. Still,
> > >> applying both FORCE NULL and FORCE NOT NULL to the same column would
> be
> > >> rather perverse, since it would result in a quoted null string
> becoming
> > >> null and an unquoted null string becoming not null.
> > >
> > > Given the remarkable lack of standardization of "CSV" output, who's
> > > to say that there might not be data sources out there for which this
> > > is the desired behavior? It's weird, I agree, but I think throwing
> > > an error for the combination is not going to be helpful. It's not
> > > like somebody might accidentally write both on the same column.
> > >
> > > +1 for clarifying the docs, though, more or less in the words you
> > > used above.
> > Following that, I have hacked the patch attached to update the docs
> > with an additional regression test (actually replaces a test that was
> > the same as the one before in copy2).
> >
> > I am attaching as well a second patch for file_fdw, to allow the use
> > of force_null and force_not_null on the same column, to be consistent
> > with COPY.
> > Regards,
>
> Correction, this is the patch applied, not the earlier version.
>
Thanks for taking the time to look at that.
--
Michael