Re: Extending COPY TO

Lists: pgsql-hackers
From: Andrea Riciputi <andrea(dot)riciputi(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Extending COPY TO
Date: 2014-09-23 06:49:16
Message-ID: 178A22CE-45D3-452D-926E-AFC4904ECF3B@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,
it’s my first time here, so please let me know if I’m doing something wrong. I’m a developer, heavy PG user, but I’ve never hacked it before. Last week at work we had to produce a quite big CSV data file which should be used as input by another piece of software.

Since the file must be produced on a daily basis, is big, and it contains data stored in our PG database, letting PG produce the file itself seemed the right approach. Unfortunately the target software is, let say, “legacy” software and can only accept CRLF as EOL character. Since our PG is installed on a Linux server COPY TO results in a CSV file with LF as EOL, forcing us to pass the file a second time to convert EOL, which is inconvenient.

My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep it simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour when no EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or not.

I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself, submit the patch here and wait for your review. However before starting this in my spare time I wanted to know if you, as the PG hackers community, would be against a similar proposal for any reason, and if so why.

Thanks in advance,
Andrea


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrea Riciputi <andrea(dot)riciputi(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extending COPY TO
Date: 2014-09-23 06:53:26
Message-ID: 542118E6.1050504@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/23/2014 09:49 AM, Andrea Riciputi wrote:
> My idea was to extend the COPY TO command to accept an EOL option as
> it already does with the DELIMITER option. To keep it simple we can
> limit the EOL choice to CR, LF or CRLF to avoid unusual output, and
> also keep the current behaviour when no EOL option is given. I was
> also wondering if an EOL option could be useful also for the text
> output format or not.

I don't think we want to go down that path. There are plenty of options
in COPY already, and the more you add, the more complicated it gets. And
we're never going to be able to satisfy everyone's needs.

I'd suggest doing:

COPY table TO PROGRAM 'unix2dos > /tmp/file'

- Heikki


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrea Riciputi <andrea(dot)riciputi(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extending COPY TO
Date: 2014-09-23 06:56:19
Message-ID: 20140923065619.GU16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrea,

* Andrea Riciputi (andrea(dot)riciputi(at)gmail(dot)com) wrote:
> My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep it simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour when no EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or not.

Have you considered using COPY TO's 'PROGRAM' option to simply pipe the
output through unix2dos..?

> I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself, submit the patch here and wait for your review. However before starting this in my spare time I wanted to know if you, as the PG hackers community, would be against a similar proposal for any reason, and if so why.

I'm not particularly against it, though if it can be solved with the
existing 'PROGRAM' capability then it may not make sense to complicate
the COPY code further.

Thanks!

Stephen


From: Andrea Riciputi <andrea(dot)riciputi(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Extending COPY TO
Date: 2014-09-24 06:23:34
Message-ID: AEDF74C3-5743-4FC6-9CC1-1B52336B7C22@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,
thanks for all your answers, I see your point. And I also understand the argument according to which there always be some other use case to satisfy. However your suggestion to use COPY TO sql TO PROGRAM doesn’t seem to me to fit well the use case I have in mind.

Imagine you access PG from an application written in the language X using a driver library, both your application and your PG instance run on two different hosts. Now using COPY TO sql PROGRAM the output file ends up on the PG host filesystem, while using COPY TO sql STDOUT and passing a file descriptor to PG via the driver library the output file ends up on the application hosts, which is much more convenient from the application point of view.

Sure you can always fix this setting up some kind of shared filesystem, but this is just the first of the issues I could think of. What about the potential I/O errors that could happen while opening/writing the output file? I should replicate them back from the PG host to the application layer, and this is something I’m pretty sure no one wants to go down.

So adding such a feature to PG itself seems to me still the best trade off between complexity and convenience. However, if you are strongly against it, or see a better way to get around this problem, please let me know. As I wrote before, despite being an heavy PG user, it’s my first time on the hackers ML and I don’t want to seem disrespectful of the community.

Thanks,
Andrea

On 23 Sep 2014, at 08:56, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Andrea,
>
> * Andrea Riciputi (andrea(dot)riciputi(at)gmail(dot)com) wrote:
>> My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep it simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour when no EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or not.
>
> Have you considered using COPY TO's 'PROGRAM' option to simply pipe the
> output through unix2dos..?
>
>> I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself, submit the patch here and wait for your review. However before starting this in my spare time I wanted to know if you, as the PG hackers community, would be against a similar proposal for any reason, and if so why.
>
> I'm not particularly against it, though if it can be solved with the
> existing 'PROGRAM' capability then it may not make sense to complicate
> the COPY code further.
>
> Thanks!
>
> Stephen


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrea Riciputi <andrea(dot)riciputi(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Extending COPY TO
Date: 2014-09-24 10:17:13
Message-ID: 54229A29.9070908@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/24/2014 09:23 AM, Andrea Riciputi wrote:
> Imagine you access PG from an application written in the language X
> using a driver library, both your application and your PG instance
> run on two different hosts.

In that scenario, you'll be using the PQgetCopyData function to get the
data. PQgetCopyData returns one row at a time; the application can
trivially change the line-ending to whatever it wants, when writing the
output to a file or wherever it goes.

> As I wrote before, despite being an heavy PG user, it’s my first time
> on the hackers ML and I don’t want to seem disrespectful of the
> community.

No worries; thanks for effort, even if this idea doesn't pan out.

- Heikki