Re: Fwd: Copy out wording

Lists: pgadmin-hackerspgsql-hackers
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: pgadmin-hackers <pgadmin-hackers(at)postgresql(dot)org>
Subject: Copy out wording
Date: 2009-09-01 12:51:07
Message-ID: 9837222c0909010551lfa4a4f5j736cabc8d9c540ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Quan Zongliang <quanzongliang(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgadmin-hackers <pgadmin-hackers(at)postgresql(dot)org>
Subject: Re: Copy out wording
Date: 2009-09-03 03:48:55
Message-ID: 20090903113451.5958.4125B4E5@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

> Shouldn't that be:
> "A NULL is output as the NULL string and is not quoted, while a data
> value matching the empty string is quoted"?
sure, the document seems to has mistake.

and i parse csv format with same way in import function

-----------------------------------------------
Quan Zongliang
quanzongliang(at)gmail(dot)com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Fwd: Copy out wording
Date: 2009-09-03 07:52:45
Message-ID: 9837222c0909030052j546aaf4bs6348aed3d08f927f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Crap, I just realized I sent to pgadmin hackers by mystake. Meh.

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
 The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-03 11:19:07
Message-ID: 4A9FA62B.7060808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Magnus Hagander wrote:
> Our documentation for COPY
> (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
> following to say:
> "
> The CSV format has no standard way to distinguish a NULL value from
> an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
> output as the NULL string and is not quoted, while a data value
> matching the NULL string is quoted. Therefore, using the default
> settings, a NULL is written as an unquoted empty string, while an
> empty string is written with double quotes (""). Reading values
> follows similar rules. You can use FORCE NOT NULL to prevent NULL
> input comparisons for specific columns.
> "
>
> Shouldn't that be:
> "A NULL is output as the NULL string and is not quoted, while a data
> value matching the empty string is quoted"?
>
> If not, then what really is the difference between a NULL and a NULL string?
>
>

No, it shouldn't. Let's say NULL is represented as "foo". Then a null
between delimiters will be written as

delimiter foo delimiter

while the string "foo" will be

delimiter quotechar foo quotechar delimiter

and an empty non-null string will be

delimiter delimiter

unless you have FORCE QUOTE on for it, in which case it will be

delimiter quotechar quotechar delimiter

We had quite a bit of debate on the shape of CSV output at the time it
was done (during 8.0), and that's what we came up with. It has the
useful property that we can round-trip the data, i.e. we can read back
the data we output without losing information about nulls, no matter
what the NULL string is, something we have always been resistant to
changing.

If you think we could explain it better, by all means have a go at it.
But your proposed change isn't accurate. Here is an illustration of the
above:

andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header;
column1,column2,column3
1,"foo",2
3,foo,4
5,,6
andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header force quote column2;
column1,column2,column3
1,"foo",2
3,foo,4
5,"",6

HTH

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-03 11:22:37
Message-ID: 9837222c0909030422t68352befqb6203ab9fcdf7815@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

On Thu, Sep 3, 2009 at 13:19, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> Our documentation for COPY
>> (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
>> following to say:
>> "
>>  The CSV format has no standard way to distinguish a NULL value from
>> an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
>> output as the NULL string and is not quoted, while a data value
>> matching the NULL string is quoted. Therefore, using the default
>> settings, a NULL is written as an unquoted empty string, while an
>> empty string is written with double quotes (""). Reading values
>> follows similar rules. You can use FORCE NOT NULL to prevent NULL
>> input comparisons for specific columns.
>> "
>>
>> Shouldn't that be:
>> "A NULL is output as the NULL string and is not quoted, while a data
>> value matching the empty string is quoted"?
>>
>> If not, then what really is the difference between a NULL and a NULL
>> string?
>>
>>
>
>
> No, it shouldn't. Let's say NULL is represented as "foo". Then a null
> between delimiters will be written as
>
>   delimiter foo delimiter
>
> while the string "foo" will be
>
>   delimiter quotechar foo quotechar delimiter
>
> and an empty non-null string will be
>
>   delimiter delimiter
>
> unless you have FORCE QUOTE on for it, in which case it will be
>
>   delimiter quotechar quotechar delimiter
>
>
> We had quite a bit of debate on the shape of CSV output at the time it was
> done (during 8.0), and that's what we came up with. It has the useful
> property that we can round-trip the data, i.e. we can read back the data we
> output without losing information about nulls, no matter what the NULL
> string is, something we have always been resistant to changing.
>
> If you think we could explain it better, by all means have a go at it. But
> your proposed change isn't accurate. Here is an illustration of the above:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

(FWIW, I totally agree with the feature, I was just confused by the docs)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-03 11:31:40
Message-ID: 4A9FA91C.90207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Magnus Hagander wrote:
> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> part of the data? I read it as "a string being NULL". Maybe something
> like "the value of the NULL string parameter" to be overly clear for
> people like me? :-)
>

We could change:

A NULL is output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted.

to

A NULL is output as the NULL parameter and is not quoted, while a non-NULL data value whose text representation
matches the NULL parameter is quoted.

or something similar. Would that be better?

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-03 11:32:24
Message-ID: 9837222c0909030432i6b873521m7121d3f012eeb11a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
>> part of the data? I read it as "a string being NULL". Maybe something
>> like "the value of the NULL string parameter" to be overly clear for
>> people like me? :-)
>>
>
> We could change:
>
>   A NULL is output as the NULL string and is not quoted, while a data value
>   matching the NULL string is quoted.
>
>
> to
>
>   A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> data value whose text representation
>   matches the NULL parameter is quoted.
>
>
> or something similar. Would that be better?

Yes, much better IMO.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-17 21:13:36
Message-ID: 200909172113.n8HLDaV08216@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Magnus Hagander wrote:
> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> >
> >
> > Magnus Hagander wrote:
> >>
> >> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> >> part of the data? I read it as "a string being NULL". Maybe something
> >> like "the value of the NULL string parameter" to be overly clear for
> >> people like me? :-)
> >>
> >
> > We could change:
> >
> > ? A NULL is output as the NULL string and is not quoted, while a data value
> > ? matching the NULL string is quoted.
> >
> >
> > to
> >
> > ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> > data value whose text representation
> > ? matches the NULL parameter is quoted.
> >
> >
> > or something similar. Would that be better?
>
> Yes, much better IMO.

I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 1.6 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-17 21:29:46
Message-ID: 4AB2AA4A.6080403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Bruce Momjian wrote:
> Magnus Hagander wrote:
>
>> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>
>>> Magnus Hagander wrote:
>>>
>>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
>>>> part of the data? I read it as "a string being NULL". Maybe something
>>>> like "the value of the NULL string parameter" to be overly clear for
>>>> people like me? :-)
>>>>
>>>>
>>> We could change:
>>>
>>> ? A NULL is output as the NULL string and is not quoted, while a data value
>>> ? matching the NULL string is quoted.
>>>
>>>
>>> to
>>>
>>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
>>> data value whose text representation
>>> ? matches the NULL parameter is quoted.
>>>
>>>
>>> or something similar. Would that be better?
>>>
>> Yes, much better IMO.
>>
>
> I have applied the attached documentation clarification patch, and
> backpatched it to 8.4.X.
>
>
>

Why didn't you follow the wording I actually suggested, which had the
virtue of being accurate.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Copy out wording
Date: 2009-09-17 21:50:07
Message-ID: 200909172150.n8HLo7R28999@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgadmin-hackers pgsql-hackers

Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
> > Magnus Hagander wrote:
> >
> >> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> >>
> >>> Magnus Hagander wrote:
> >>>
> >>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> >>>> part of the data? I read it as "a string being NULL". Maybe something
> >>>> like "the value of the NULL string parameter" to be overly clear for
> >>>> people like me? :-)
> >>>>
> >>>>
> >>> We could change:
> >>>
> >>> ? A NULL is output as the NULL string and is not quoted, while a data value
> >>> ? matching the NULL string is quoted.
> >>>
> >>>
> >>> to
> >>>
> >>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> >>> data value whose text representation
> >>> ? matches the NULL parameter is quoted.
> >>>
> >>>
> >>> or something similar. Would that be better?
> >>>
> >> Yes, much better IMO.
> >>
> >
> > I have applied the attached documentation clarification patch, and
> > backpatched it to 8.4.X.
> >
> >
> >
>
> Why didn't you follow the wording I actually suggested, which had the
> virtue of being accurate.

I thought the problem was the use of the word "null string", which
clearly was confusing.

I have updated the docs to use your wording, with a little
clarification. The diff against yesterday's CVS is attached, which is
not smaller.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff3 text/x-diff 1.9 KB