CSV patch applied

Lists: pgsql-hackerspgsql-patches
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: CSV patch applied
Date: 2004-04-19 17:26:52
Message-ID: 200404191726.i3JHQrs16216@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I have applied the attached patch that complete TODO item:

o -Allow dump/load of CSV format

This adds new keywords to COPY and \copy:

CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columns
LITERAL - prevent NULL checks for specific columns

If folks have better ideas for naming of those keywords, I am all ears!

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 44.2 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: CSV patch applied
Date: 2004-04-20 10:05:30
Message-ID: 200404201205.30693.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> I have applied the attached patch that complete TODO item:
>
> o -Allow dump/load of CSV format
>
> This adds new keywords to COPY and \copy:
>
> CSV - enable CSV mode
> QUOTE - specify quote character
> ESCAPE - specify escape character
> FORCE - force quoting of specified columns

FORCE QUOTE

> LITERAL - prevent NULL checks for specific columns

NO NULL CHECK

> If folks have better ideas for naming of those keywords, I am all
> ears!


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: COPY CSV keywords
Date: 2004-04-20 13:22:00
Message-ID: 200404201322.i3KDM0a04241@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


[ discussion moved to hackers.]

Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > I have applied the attached patch that complete TODO item:
> >
> > o -Allow dump/load of CSV format
> >
> > This adds new keywords to COPY and \copy:
> >
> > CSV - enable CSV mode
> > QUOTE - specify quote character
> > ESCAPE - specify escape character
> > FORCE - force quoting of specified columns
>
> FORCE QUOTE
>
> > LITERAL - prevent NULL checks for specific columns
>
> NO NULL CHECK

I considered going this direction, but it broke the WITH clause style of
COPY. Previously it was "WITH keyword value". Now it is also "WITH
keyword value, value" too. This would add "WITH keyword keyword value,
value".

It would change:

COPY pg_language TO '/tmp/x' WITH CSV FORCE lanname
LITERAL lanacl
to:
COPY pg_language TO '/tmp/x' WITH CSV FORCE QUOTE lanname NO
NULL CHECK lanacl

If folks like it, I can make the change.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: CSV patch applied
Date: 2004-04-20 13:39:58
Message-ID: 4085282E.1040202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:

>Bruce Momjian wrote:
>
>
>>I have applied the attached patch that complete TODO item:
>>
>> o -Allow dump/load of CSV format
>>
>>This adds new keywords to COPY and \copy:
>>
>> CSV - enable CSV mode
>> QUOTE - specify quote character
>> ESCAPE - specify escape character
>> FORCE - force quoting of specified columns
>>
>>
>
>FORCE QUOTE
>
>
>
>> LITERAL - prevent NULL checks for specific columns
>>
>>
>
>NO NULL CHECK
>
>
>
>>If folks have better ideas for naming of those keywords, I am all
>>ears!
>>
>>

Bruce and I tossed this around quite a bit.

The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;

I also don't think NO NULL CHECK actually matches the use case for this
very well (and I'm dubious about LITERAL too). They both describe the
actual behaviour, but not what you are trying to achieve. Essentially,
this would be used when you have a field with a NOT NULL constraint, but
the input CSV data stream has what would otherwise be considered nulls.
(COPY itself will never produce such a CSV, as non-null values that
resemble null are always quoted, but third party programs well might.)
So an alternative might be FORCE NOT NULL, but for the previous
consideration. Perhaps use of an optional preposition might make things
slightly clearer, e.g.:

COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;

But it does start to look a little too much like COBOL :-).

So I'm interested to see if there are any other inspirations people have.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY CSV keywords
Date: 2004-04-20 14:03:51
Message-ID: 40852DC7.7010209@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:

>[ discussion moved to hackers.]
>
>Peter Eisentraut wrote:
>
>
>>Bruce Momjian wrote:
>>
>>
>>>I have applied the attached patch that complete TODO item:
>>>
>>> o -Allow dump/load of CSV format
>>>
>>>This adds new keywords to COPY and \copy:
>>>
>>> CSV - enable CSV mode
>>> QUOTE - specify quote character
>>> ESCAPE - specify escape character
>>> FORCE - force quoting of specified columns
>>>
>>>
>>FORCE QUOTE
>>
>>
>>
>>> LITERAL - prevent NULL checks for specific columns
>>>
>>>
>>NO NULL CHECK
>>
>>
>
>I considered going this direction, but it broke the WITH clause style of
>COPY. Previously it was "WITH keyword value". Now it is also "WITH
>keyword value, value" too. This would add "WITH keyword keyword value,
>value".
>
>It would change:
>
> COPY pg_language TO '/tmp/x' WITH CSV FORCE lanname
> LITERAL lanacl
>to:
> COPY pg_language TO '/tmp/x' WITH CSV FORCE QUOTE lanname NO
> NULL CHECK lanacl
>
>If folks like it, I can make the change.
>
>
>
[reposting my comments from patches]

The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;

I also don't think NO NULL CHECK actually matches the use case for this
very well (and I'm dubious about LITERAL too). They both describe the
actual behaviour, but not what you are trying to achieve. Essentially,
this would be used when you have a field with a NOT NULL constraint, but
the input CSV data stream has what would otherwise be considered nulls.
(COPY itself will never produce such a CSV, as non-null values that
resemble null are always quoted, but third party programs well might.)
So an alternative might be FORCE NOT NULL, but for the previous
consideration. Perhaps use of an optional preposition might make things
slightly clearer, e.g.:

COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;

But it does start to look a little too much like COBOL .

So I'm interested to see if there are any other inspirations people have.

cheers

andrew


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: CSV patch applied
Date: 2004-04-20 14:25:41
Message-ID: Pine.LNX.4.58.0404201612500.6454@sablons.cri.ensmp.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> >> CSV - enable CSV mode
> >> QUOTE - specify quote character
> >> ESCAPE - specify escape character
> >> FORCE - force quoting of specified columns
> >
> >FORCE QUOTE

QUOTING col1,col2?
QUOTED col1,col2?
IN QUOTES col1,cold

> >> LITERAL - prevent NULL checks for specific columns
> >
> >NO NULL CHECK

QUOTED (meaning 'as quoted')?

From a language design point of view, I think it may be better to stick
to one word versions?

--
Fabien Coelho - coelho(at)cri(dot)ensmp(dot)fr


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] CSV patch applied
Date: 2004-04-20 14:33:49
Message-ID: 200404201433.i3KEXnq19954@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan wrote:
> Bruce and I tossed this around quite a bit.
>
> The problem is that using QUOTE or NULL in these phrases might look
> confusing, e.g.
> COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
> COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;
>
> I also don't think NO NULL CHECK actually matches the use case for this
> very well (and I'm dubious about LITERAL too). They both describe the
> actual behaviour, but not what you are trying to achieve. Essentially,
> this would be used when you have a field with a NOT NULL constraint, but
> the input CSV data stream has what would otherwise be considered nulls.
> (COPY itself will never produce such a CSV, as non-null values that
> resemble null are always quoted, but third party programs well might.)
> So an alternative might be FORCE NOT NULL, but for the previous
> consideration. Perhaps use of an optional preposition might make things
> slightly clearer, e.g.:
>
> COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
> field1,field2;
> COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
> field1,field2;
>
> But it does start to look a little too much like COBOL :-).

Yea, and as I remember, that COBOL word scared us off. :-)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY CSV keywords
Date: 2004-04-20 14:41:32
Message-ID: 18851.1082472092@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Peter Eisentraut wrote:
>> Bruce Momjian wrote:
>>> LITERAL - prevent NULL checks for specific columns
>>
>> NO NULL CHECK

> I considered going this direction, but it broke the WITH clause style of
> COPY.

Huh? Many of the other options are already potentially two words,
eg QUOTE [AS], so I don't see how you can object to that aspect.

I agree with Peter that "LITERAL" is a completely unhelpful name for the
functionality.

"NO NULL CHECK" seems a bit confusing as well --- at first glance it
looks like it might mean overriding the column's NOT NULL attribute
(which I trust is not what it means ;-)). Perhaps "NO NULLS" ?
Or just "NOT NULL", to avoid introducing another keyword.

I liked the "FORCE QUOTE" suggestion, too.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY CSV keywords
Date: 2004-04-20 14:49:34
Message-ID: 200404201449.i3KEnYA22335@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Peter Eisentraut wrote:
> >> Bruce Momjian wrote:
> >>> LITERAL - prevent NULL checks for specific columns
> >>
> >> NO NULL CHECK
>
> > I considered going this direction, but it broke the WITH clause style of
> > COPY.
>
> Huh? Many of the other options are already potentially two words,
> eg QUOTE [AS], so I don't see how you can object to that aspect.

AS was a noise word, while the new keywords would not be.

> I agree with Peter that "LITERAL" is a completely unhelpful name for the
> functionality.

Yep.

> "NO NULL CHECK" seems a bit confusing as well --- at first glance it
> looks like it might mean overriding the column's NOT NULL attribute
> (which I trust is not what it means ;-)). Perhaps "NO NULLS" ?

Right. It doesn't affect the constraint.

> Or just "NOT NULL", to avoid introducing another keyword.

Yes, I liked that, but to me it looked like we were saying the column
was NOT NULL, which isn't the meaning. You could almost call it QUOTE
NULL because it is actually quoting the NULL strings in those columns.

> I liked the "FORCE QUOTE" suggestion, too.

OK.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY CSV keywords
Date: 2004-04-20 15:04:24
Message-ID: 40853BF8.4060304@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:

>
> The problem is that using QUOTE or NULL in these phrases might look
> confusing, e.g.
> COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
> COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;
>
> I also don't think NO NULL CHECK actually matches the use case for
> this very well (and I'm dubious about LITERAL too). They both
> describe the actual behaviour, but not what you are trying to achieve.
> Essentially, this would be used when you have a field with a NOT NULL
> constraint, but the input CSV data stream has what would otherwise be
> considered nulls. (COPY itself will never produce such a CSV, as
> non-null values that resemble null are always quoted, but third party
> programs well might.) So an alternative might be FORCE NOT NULL, but
> for the previous consideration. Perhaps use of an optional preposition
> might make things slightly clearer, e.g.:
>
> COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
> field1,field2;
> COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
> field1,field2;
>
> But it does start to look a little too much like COBOL .
>
> So I'm interested to see if there are any other inspirations people have.

The other alternative for the NOT NULL side would be to abandon it and
tell users they would need to use a trigger. That requires a little more
work from them, but would work in the general case, whereas this one is
likely to fail on everything but a text-and-friends column. Example:

andrew=# create table b(a int not null default 0);
CREATE TABLE
andrew=# create function bnull() returns trigger language plpgsql as $$
andrew$# begin
andrew$# if NEW.a is null then NEW.a := 0; end if;
andrew$# return NEW;
andrew$# end;
andrew$# $$;
CREATE FUNCTION
andrew=# create trigger btrigger before insert on b FOR EACH ROW EXECUTE
PROCEDURE bnull();
CREATE TRIGGER
andrew=# copy b from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>>
>> \.
andrew=# select * from b
andrew-# ;
a
---
0
(1 row)

But there isn't a reasonable alternative to making the user make choices
on the output side (see previous discussion regarding zip codes).

cheers

andrew


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: CSV patch applied
Date: 2004-04-20 15:41:09
Message-ID: 200404201541.i3KFf9S12748@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Fabien COELHO wrote:
>
> > >> CSV - enable CSV mode
> > >> QUOTE - specify quote character
> > >> ESCAPE - specify escape character
> > >> FORCE - force quoting of specified columns
> > >
> > >FORCE QUOTE
>
> QUOTING col1,col2?
> QUOTED col1,col2?
> IN QUOTES col1,cold
>
> > >> LITERAL - prevent NULL checks for specific columns
> > >
> > >NO NULL CHECK
>
> QUOTED (meaning 'as quoted')?
>
> From a language design point of view, I think it may be better to stick
> to one word versions?

I understand, and we tried that. The confusion is that FORCE forces
quotes on non-NULL values, while LITERAL forces quotes on NULL values,
so while both deal with quoting, their functionality is different for
input and output. They same single keyword just seemed confusing.

This is why the two-keyword idea sounds good to me --- we use the word
QUOTE (already a keyword), and then say FORCE for output, and NULL for
input:

FORCE QUOTE
QUOTE NULL

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073