Re: [HACKERS] Unworkable column delimiter characters for COPY

Lists: pgsql-hackerspgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Unworkable column delimiter characters for COPY
Date: 2007-12-27 15:37:07
Message-ID: 5426.1198769827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Currently, copy.c rejects newline, carriage return, and backslash as
settings for the column delimiter character (in non-CSV mode). These
all seem necessary to avoid confusion. However, I just noticed that the
letters r, n, t, etc would also not work: on output, data characters
matching such a delimiter would get escaped as \r, \n, etc, which on
input would be read as C-style control characters.

I think at minimum we need to forbid b, f, n, r, t, v, which are the
control character representations currently recognized by COPY.
But I'm tempted to make it reject all 26 lower-case ASCII letters,
as a form of future-proofing. Thoughts?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Unworkable column delimiter characters for COPY
Date: 2007-12-27 16:58:50
Message-ID: 4773D9CA.1030708@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Currently, copy.c rejects newline, carriage return, and backslash as
> settings for the column delimiter character (in non-CSV mode). These
> all seem necessary to avoid confusion. However, I just noticed that the
> letters r, n, t, etc would also not work: on output, data characters
> matching such a delimiter would get escaped as \r, \n, etc, which on
> input would be read as C-style control characters.
>
> I think at minimum we need to forbid b, f, n, r, t, v, which are the
> control character representations currently recognized by COPY.
> But I'm tempted to make it reject all 26 lower-case ASCII letters,
> as a form of future-proofing. Thoughts?
>
>

Assuming this is only for non-CSV mode, it seems OK.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unworkable column delimiter characters for COPY
Date: 2007-12-27 18:32:01
Message-ID: 18844.1198780321@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>> control character representations currently recognized by COPY.
>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>> as a form of future-proofing. Thoughts?

> Assuming this is only for non-CSV mode, it seems OK.

On looking closer, 'x', octal digits, and '.' would also be trouble.
So I made it reject a-z, 0-9, and dot.

It appears that the CSV mode is a few bricks shy of a load here as
well: it will let you do CSV DELIMITER '"' resulting in entirely
broken output. It seems we ought to forbid delimiter from matching CSV
quote or escape characters. I'll let you clean up that case though...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unworkable column delimiter characters for COPY
Date: 2007-12-27 19:23:12
Message-ID: 4773FBA0.5000509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>>> control character representations currently recognized by COPY.
>>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>>> as a form of future-proofing. Thoughts?
>>>
>
>
>> Assuming this is only for non-CSV mode, it seems OK.
>>
>
> On looking closer, 'x', octal digits, and '.' would also be trouble.
> So I made it reject a-z, 0-9, and dot.
>

I take it upper case A-F are safe, even though they are hex digits,
because they wouldn't immediately follow the backslash?

> It appears that the CSV mode is a few bricks shy of a load here as
> well: it will let you do CSV DELIMITER '"' resulting in entirely
> broken output. It seems we ought to forbid delimiter from matching CSV
> quote or escape characters. I'll let you clean up that case though...
>
>
>

Lucky me. Ok, I'll look at it. Should be simple enough.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unworkable column delimiter characters for COPY
Date: 2007-12-28 04:14:38
Message-ID: 4774782E.9030500@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>>> control character representations currently recognized by COPY.
>>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>>> as a form of future-proofing. Thoughts?
>>>
>
>
>> Assuming this is only for non-CSV mode, it seems OK.
>>
>
> On looking closer, 'x', octal digits, and '.' would also be trouble.
> So I made it reject a-z, 0-9, and dot.
>
> It appears that the CSV mode is a few bricks shy of a load here as
> well: it will let you do CSV DELIMITER '"' resulting in entirely
> broken output. It seems we ought to forbid delimiter from matching CSV
> quote or escape characters. I'll let you clean up that case though...
>

This should do the trick - I'll apply it tomorrow.

cheers

andrew

Index: copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.293
diff -c -r1.293 copy.c
*** copy.c 27 Dec 2007 18:28:58 -0000 1.293
--- copy.c 28 Dec 2007 04:07:06 -0000
***************
*** 889,894 ****
--- 889,907 ----
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be
\"%s\"", cstate->delim)));

+ /* In CSV mode, disallow quote or escape chars as delimiter */
+ if (cstate->csv_mode)
+ {
+ if (cstate->delim[0] == cstate->quote[0])
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and
quote must be different")));
+ else if (cstate->delim[0] == cstate->escape[0])
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and
escape must be different")));
+ }
+
/* Check header */
if (!cstate->csv_mode && cstate->header_line)
ereport(ERROR,


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unworkable column delimiter characters for COPY
Date: 2007-12-28 04:25:33
Message-ID: 25251.1198815933@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> It seems we ought to forbid delimiter from matching CSV
>> quote or escape characters. I'll let you clean up that case though...

> This should do the trick - I'll apply it tomorrow.

A couple thoughts:

* This test needs to appear further down --- it is not sensible until
after you've checked strlen() == 1 for all the strings involved.

* I see that we disallow the CSV quote character from appearing in the
null_print string, but not the escape character. Is this
correct/sensible? If it is correct, maybe the delimiter could also
match the escape character?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unworkable column delimiter characters for COPY
Date: 2007-12-28 17:14:42
Message-ID: 47752F02.3030807@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> I see that we disallow the CSV quote character from appearing in the
> null_print string, but not the escape character. Is this
> correct/sensible?

Yes, because:
. nulls are never quoted
. fields containing the quote char must be quoted
. the escape char is only magical inside quoted fields

> If it is correct, maybe the delimiter could also
> match the escape character?
>
>

Yes, probably. Crazy, but I think it's workable. I'll take that test
out, and just make sure that the delimiter is different from the quote.

cheers

andrew