Re: Win32 patch for COPY

Lists: pgsql-patches
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Win32 patch for COPY
Date: 2003-04-18 03:17:20
Message-ID: 200304180317.h3I3HK713174@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here is a patch to allow COPY FROM to accept line terminators of \r, \n,
and \r\n, and for COPY TO to output \r\n on Win32.

CHANGES FROM PREVIOUS BEHAVIOR:

o We used to allow a literal carriage return as a data value,
while this patch will assume it is a line terminator.

This was not documented in the COPY manual page, and was not output as
part of COPY, but it was accepted, while in 7.4 it will not. You can
still supply carriage return as \r or backslash-carriage-return.

One trick was to prevent silently ignoring carriage returns at the end
of a line in non-\r\n files. The solution was to create a has_crnl
variable that is set from the first copy line --- if it is false, a
literal carriage return found as a data value will throw an error, while
a newline without a preceeding carriage return also throws an error.
Backslash-literal still works fine. Literal carriage returns or line
feeds not at the end of a line will cause the next line to have the
incorrect number of fields which will throw an error.

Even single-line COPY tables are properly checked when using
STDIN/STDOUT because the \. must also terminate consistenly.

Another change is that Win32 will output COPY files as native \r\n,
rather than \n. Of course, this can be loaded into non-Win32 too.

Should be be outputting \r for OS X?

The good news is that copy.c is the only place where EOL still needs to
be dealt with. Other files are either open in text mode (meaning they
can handle any end-of-line format) or aren't edited/created by users.
There is no need to change psql \copy because those files are opened in
text mode.

I also cleaned up the BinarySignature variable usage.

I have tested with \n and \r\n PGEOL values.

Docs updated.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 9.0 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Win32 patch for COPY
Date: 2003-04-18 23:45:43
Message-ID: 200304182345.h3INjhc04871@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Here is a new version of the patch. I realized that if you were loading
in a column that had only a single column, the tests wouldn't throw an
error on a literal \r, so I added code to record the end-of-line type
for the first line, and make sure the rest of the lines match.

It is a little more code, but I think COPY is important enough to
cleanly handle this and be 100% reliable.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Here is a patch to allow COPY FROM to accept line terminators of \r, \n,
> and \r\n, and for COPY TO to output \r\n on Win32.
>
> CHANGES FROM PREVIOUS BEHAVIOR:
>
> o We used to allow a literal carriage return as a data value,
> while this patch will assume it is a line terminator.
>
> This was not documented in the COPY manual page, and was not output as
> part of COPY, but it was accepted, while in 7.4 it will not. You can
> still supply carriage return as \r or backslash-carriage-return.
>
> One trick was to prevent silently ignoring carriage returns at the end
> of a line in non-\r\n files. The solution was to create a has_crnl
> variable that is set from the first copy line --- if it is false, a
> literal carriage return found as a data value will throw an error, while
> a newline without a preceeding carriage return also throws an error.
> Backslash-literal still works fine. Literal carriage returns or line
> feeds not at the end of a line will cause the next line to have the
> incorrect number of fields which will throw an error.
>
> Even single-line COPY tables are properly checked when using
> STDIN/STDOUT because the \. must also terminate consistenly.
>
> Another change is that Win32 will output COPY files as native \r\n,
> rather than \n. Of course, this can be loaded into non-Win32 too.
>
> Should be be outputting \r for OS X?
>
> The good news is that copy.c is the only place where EOL still needs to
> be dealt with. Other files are either open in text mode (meaning they
> can handle any end-of-line format) or aren't edited/created by users.
> There is no need to change psql \copy because those files are opened in
> text mode.
>
> I also cleaned up the BinarySignature variable usage.
>
> I have tested with \n and \r\n PGEOL values.
>
> Docs updated.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 9.5 KB