Review : Add hooks for pre- and post-processor executables for COPY and \copy

Lists: pgsql-hackers
From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2012-12-18 12:42:25
Message-ID: 006801cddd1d$22773440$67659cc0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Basic stuff:
------------
- Rebase of Patch is required.
- Compiles cleanly without any errors/warnings
- Regression tests pass.

What it does:
---------------------
This patch is useful when COPY command input/output are stored in
compression format or in any command/script uses these output/input in any
means; without generating intermediate temporary files.
This feature can be used in server side using "COPY statement" by
administrator. Or can be used in psql internal "\copy" command by any user.

Code Review comments:
---------------------

1. Modify the comment in function header of: parse_slash_copy (needs to
modify for new syntax)
2. Comments for functions OpenPipeStream & ClosePipeStream are missing.

3. Any Script errors are not directly visible to user; If there problems in
script no way to cleanup.

Shouldn't this be mentioned in User Manual.

Test case issues:
------------------
1. "Broken pipe" is not handled in case of psql "\copy" command;
Issue are as follows:
Following are verified on SuSE-Linux 10.2.
1) psql is exiting when "\COPY xxx TO" command is issued and
command/script is not found

When popen is called in write mode it is creating valid
file descriptor and when it tries to write to file "Broken pipe" error is
coming which is not handled.
psql# \copy pgbench_accounts TO PROGRAM
'../compress.sh pgbench_accounts4.txt'
2) When "\copy" command is in progress then program/command is
killed/"crashed due to any problem"
psql is exiting.

Script used in testcases:
------------------
1. compress.sh
echo 'cat > $1' > compress.sh
echo 'bzip2 -z $1' >> compress.sh
chmod +x compress.sh

2. decompress.sh
echo 'bzip2 -d -c -k $*' > decompress.sh
chmod +x decompress.sh

Testcases executed are attached with this mail.

With Regards,

Amit Kapila.

Attachment Content-Type Size
testcases.txt text/plain 3.3 KB

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-01-23 12:05:52
Message-ID: 00c801cdf961$fd888920$f8999b60$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Amit,

Thank you for your review. I've rebased and updated the patch. Please find
attached the patch.

> Code Review comments:
> ---------------------
>
> 1. Modify the comment in function header of: parse_slash_copy (needs to modify
for new syntax)

Done.

> 2. Comments for functions OpenPipeStream & ClosePipeStream are missing.

Done.

> 3. Any Script errors are not directly visible to user; If there problems in
script no way to cleanup.

> Shouldn't this be mentioned in User Manual.

Done. Please see the documentation note on the \copy instruction in
psql-ref.sgml.

> Test case issues:
> ------------------
> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> Issue are as follows:
> Following are verified on SuSE-Linux 10.2.
> 1) psql is exiting when "\COPY xxx TO" command is issued and
command/script is not found

> When popen is called in write mode it is creating valid file
descriptor and when it tries to write to file "Broken pipe" error is > coming
which is not handled.
> psql# \copy pgbench_accounts TO PROGRAM
'../compress.sh pgbench_accounts4.txt'
> 2) When "\copy" command is in progress then program/command is
killed/"crashed due to any problem"
> psql is exiting.

This is a headache. I have no idea how to solve this.

Sorry for the long delay in responding.

Best regards,

Etsuro Fujita

Attachment Content-Type Size
copy-popen-20130123.patch application/octet-stream 32.5 KB

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-01-31 03:39:58
Message-ID: 002e01cdff64$a53663b0$efa32b10$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, January 23, 2013 5:36 PM Etsuro Fujita wrote:
> Hi Amit,

> Thank you for your review.  I’ve rebased and updated the patch.  Please
find attached the patch.

>> Test case issues:
>> ------------------
>> 1. "Broken pipe" is not handled in case of psql "\copy" command;
>>     Issue are as follows:
>>         Following are verified on SuSE-Linux 10.2.
>>         1) psql is exiting when "\COPY xxx TO" command is issued and
command/script is not found
>>                 When popen is called in write mode it is creating valid
file descriptor and when it tries to write to file "Broken pipe" error is >
coming which is not handled.
>>                         psql# \copy pgbench_accounts TO PROGRAM
'../compress.sh pgbench_accounts4.txt'
>>         2) When "\copy" command is in progress then program/command is
killed/"crashed due to any problem"
>>            psql is exiting.

>This is a headache.  I have no idea how to solve this.

I think we can keep it for committer to take a call on this issue.
The other changes done by you in revised patch are fine.

I have found few more minor issues as below:

1. The comment above do_copy can be modified to address the new
functionality it can handle.
/*
* Execute a \copy command (frontend copy). We have to open a file, then
* submit a COPY query to the backend and either feed it data from the
* file or route its response into the file.
*/
bool
do_copy(const char *args)

2.
@@ -256,8 +273,14 @@ do_copy(const char *args)
+ if (options->file == NULL && options->program)
+ {
+ psql_error("program is not supported to stdout/pstdout or
from stdin/pstdin\n");
+ return false;
+ }

should call free_copy_options(options); before return false;

3. \copy command doesn't need semicolon at end, however it was working
previous to your patch, but
now it is giving error.
postgres=# \copy t1 from 'e:\pg_git_code\Data\t1_Data.txt';
e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
e:/pg_git_code/Data/t1_Data.txt';: No such file or directory

4. Please check if OpenPipeStream() it needs to call
if (ReleaseLruFile()),

5. Following in copy.sgml can be changed to make more meaningful as the
first line looks little adhoc.
+ <para>
+ The command that input comes from or that output goes to.
+ The command for COPY FROM, which input comes from, must write its
output
+ to standard output. The command for COPY TO, which output goes to,
must
+ read its input from standard input.
+ </para>

6. Can we have one example of this new syntax, it can make it more
meaningful.

With Regards,
Amit Kapila.


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-20 11:55:03
Message-ID: 004201ce0f61$1e57adb0$5b070910$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Amit,

Thank you for the review.

> From: Amit Kapila [mailto:amit(dot)kapila(at)huawei(dot)com]

> >> Test case issues:
> >> ------------------
> >> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> >>     Issue are as follows:
> >>         Following are verified on SuSE-Linux 10.2.
> >>         1) psql is exiting when "\COPY xxx TO" command is issued and
> command/script is not found
> >>                 When popen is called in write mode it is creating valid
> file descriptor and when it tries to write to file "Broken pipe" error is >
> coming which is not handled.
> >>                         psql# \copy pgbench_accounts TO PROGRAM
> '../compress.sh pgbench_accounts4.txt'
> >>         2) When "\copy" command is in progress then program/command is
> killed/"crashed due to any problem"
> >>            psql is exiting.
>
> >This is a headache.  I have no idea how to solve this.
>
> I think we can keep it for committer to take a call on this issue.

Agreed.

> I have found few more minor issues as below:
>
> 1. The comment above do_copy can be modified to address the new
> functionality it can handle.
> /*
> * Execute a \copy command (frontend copy). We have to open a file, then
> * submit a COPY query to the backend and either feed it data from the
> * file or route its response into the file.
> */
> bool
> do_copy(const char *args)

Done.

> 2.
> @@ -256,8 +273,14 @@ do_copy(const char *args)
> + if (options->file == NULL && options->program)
> + {
> + psql_error("program is not supported to stdout/pstdout or
> from stdin/pstdin\n");
> + return false;
> + }
>
> should call free_copy_options(options); before return false;

Good catch! Done.

> 3. \copy command doesn't need semicolon at end, however it was working
> previous to your patch, but
> now it is giving error.
> postgres=# \copy t1 from 'e:\pg_git_code\Data\t1_Data.txt';
> e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
> e:/pg_git_code/Data/t1_Data.txt';: No such file or directory

Sorry, I've fixed the bug.

> 4. Please check if OpenPipeStream() it needs to call
> if (ReleaseLruFile()),

OpenPipeStream() calls ReleaseLruFile() by itself if necessary.

> 5. Following in copy.sgml can be changed to make more meaningful as the
> first line looks little adhoc.
> + <para>
> + The command that input comes from or that output goes to.
> + The command for COPY FROM, which input comes from, must write its
> output
> + to standard output. The command for COPY TO, which output goes to,
> must
> + read its input from standard input.
> + </para>

I've struggled to make the document more meaningful.

> 6. Can we have one example of this new syntax, it can make it more
> meaningful.

Done.

Sorry for the long delay.

Best regards,
Etsuro Fujita

> With Regards,
> Amit Kapila.
>
>

Attachment Content-Type Size
copy-popen-20130220.patch application/octet-stream 34.6 KB

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-22 10:17:54
Message-ID: 003801ce10e5$e175c4f0$a4614ed0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, February 20, 2013 5:25 PM Etsuro Fujita wrote:
> Hi Amit,
>
> Thank you for the review.

Etsuro-san, you are welcome.

> > From: Amit Kapila [mailto:amit(dot)kapila(at)huawei(dot)com]
>
> > >> Test case issues:
> > >> ------------------
> > >> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> > >>     Issue are as follows:
> > >>         Following are verified on SuSE-Linux 10.2.
> > >>         1) psql is exiting when "\COPY xxx TO" command is issued
> > >> and
> > command/script is not found
> > >>                 When popen is called in write mode it is creating
> > >>valid
> > file descriptor and when it tries to write to file "Broken pipe"
> error
> > is > coming which is not handled.
> > >>                         psql# \copy pgbench_accounts TO PROGRAM
> > '../compress.sh pgbench_accounts4.txt'
> > >>         2) When "\copy" command is in progress then
> program/command
> > >> is
> > killed/"crashed due to any problem"
> > >>            psql is exiting.
> >
> > >This is a headache.  I have no idea how to solve this.
> >
> > I think we can keep it for committer to take a call on this issue.
>
> Agreed.
>
> > I have found few more minor issues as below:
> >
> > 1. The comment above do_copy can be modified to address the new
> > functionality it can handle.
> > /*
> > * Execute a \copy command (frontend copy). We have to open a file,
> > then
> > * submit a COPY query to the backend and either feed it data from
> the
> > * file or route its response into the file.
> > */
> > bool
> > do_copy(const char *args)
>
> Done.
>
> > 2.
> > @@ -256,8 +273,14 @@ do_copy(const char *args)
> > + if (options->file == NULL && options->program)
> > + {
> > + psql_error("program is not supported to
> > + stdout/pstdout or
> > from stdin/pstdin\n");
> > + return false;
> > + }
> >
> > should call free_copy_options(options); before return false;
>
> Good catch! Done.
>
> > 3. \copy command doesn't need semicolon at end, however it was
> working
> > previous to your patch, but
> > now it is giving error.
> > postgres=# \copy t1 from 'e:\pg_git_code\Data\t1_Data.txt';
> > e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
> > e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
>
> Sorry, I've fixed the bug.
>
> > 4. Please check if OpenPipeStream() it needs to call
> > if (ReleaseLruFile()),
>
> OpenPipeStream() calls ReleaseLruFile() by itself if necessary.

I have asked this thinking that ReleaseLruFile() may not be useful for
OpenPipeStream,
As I was not sure how the new file descriptors get allocated for popen.
But now again reading popen specs, I got the point that it can be useful.

> > 5. Following in copy.sgml can be changed to make more meaningful as
> > the first line looks little adhoc.
> > + <para>
> > + The command that input comes from or that output goes to.
> > + The command for COPY FROM, which input comes from, must write
> > + its
> > output
> > + to standard output. The command for COPY TO, which output
> goes
> > + to,
> > must
> > + read its input from standard input.
> > + </para>
>
> I've struggled to make the document more meaningful.

To be honest, I am not sure whether introducing pre, post processor
terminology is right or not,
But again I shall let committer decide about this point.

> > 6. Can we have one example of this new syntax, it can make it more
> > meaningful.
>
> Done.
>
> Sorry for the long delay.

All the reported issues are handled in the new patch.

I have one small another doubt that in function parse_slash_copy, you
avoided expand tilde
for program case, which I am not sure is the right thing or not.

I am marking this patch as Ready For Committer.

Notes For Committer
-----------------------
1. "Broken pipe" is not handled in case of psql "\copy" command;
This is currently documented
2. Documentation needs to be checked, especially with focus whether
introducing pre, post processor terminology is
Okay.
3. In function parse_slash_copy, expand tilde is avaoided, is it okay?

With Regards,
Amit Kapila.


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-22 10:43:39
Message-ID: 003601ce10e9$796e76f0$6c4b64d0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Amit,

Thank you for your careful review!

> -----Original Message-----
> From: Amit Kapila [mailto:amit(dot)kapila(at)huawei(dot)com]
> Sent: Friday, February 22, 2013 7:18 PM
> To: 'Etsuro Fujita'; 'pgsql-hackers'
> Subject: RE: [HACKERS] Review : Add hooks for pre- and post-processor
> executables for COPY and \copy
>
> On Wednesday, February 20, 2013 5:25 PM Etsuro Fujita wrote:
> > Hi Amit,
> >
> > Thank you for the review.
>
> Etsuro-san, you are welcome.
>
> > > From: Amit Kapila [mailto:amit(dot)kapila(at)huawei(dot)com]
> >
> > > >> Test case issues:
> > > >> ------------------
> > > >> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> > > >>     Issue are as follows:
> > > >>         Following are verified on SuSE-Linux 10.2.
> > > >>         1) psql is exiting when "\COPY xxx TO" command is issued
> > > >> and
> > > command/script is not found
> > > >>                 When popen is called in write mode it is creating
> > > >>valid
> > > file descriptor and when it tries to write to file "Broken pipe"
> > error
> > > is > coming which is not handled.
> > > >>                         psql# \copy pgbench_accounts TO PROGRAM
> > > '../compress.sh pgbench_accounts4.txt'
> > > >>         2) When "\copy" command is in progress then
> > program/command
> > > >> is
> > > killed/"crashed due to any problem"
> > > >>            psql is exiting.
> > >
> > > >This is a headache.  I have no idea how to solve this.
> > >
> > > I think we can keep it for committer to take a call on this issue.
> >
> > Agreed.
> >
> > > I have found few more minor issues as below:
> > >
> > > 1. The comment above do_copy can be modified to address the new
> > > functionality it can handle.
> > > /*
> > > * Execute a \copy command (frontend copy). We have to open a file,
> > > then
> > > * submit a COPY query to the backend and either feed it data from
> > the
> > > * file or route its response into the file.
> > > */
> > > bool
> > > do_copy(const char *args)
> >
> > Done.
> >
> > > 2.
> > > @@ -256,8 +273,14 @@ do_copy(const char *args)
> > > + if (options->file == NULL && options->program)
> > > + {
> > > + psql_error("program is not supported to
> > > + stdout/pstdout or
> > > from stdin/pstdin\n");
> > > + return false;
> > > + }
> > >
> > > should call free_copy_options(options); before return false;
> >
> > Good catch! Done.
> >
> > > 3. \copy command doesn't need semicolon at end, however it was
> > working
> > > previous to your patch, but
> > > now it is giving error.
> > > postgres=# \copy t1 from 'e:\pg_git_code\Data\t1_Data.txt';
> > > e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
> > > e:/pg_git_code/Data/t1_Data.txt';: No such file or directory
> >
> > Sorry, I've fixed the bug.
> >
> > > 4. Please check if OpenPipeStream() it needs to call
> > > if (ReleaseLruFile()),
> >
> > OpenPipeStream() calls ReleaseLruFile() by itself if necessary.
>
> I have asked this thinking that ReleaseLruFile() may not be useful for
> OpenPipeStream,
> As I was not sure how the new file descriptors get allocated for popen.
> But now again reading popen specs, I got the point that it can be useful.
>
> > > 5. Following in copy.sgml can be changed to make more meaningful as
> > > the first line looks little adhoc.
> > > + <para>
> > > + The command that input comes from or that output goes to.
> > > + The command for COPY FROM, which input comes from, must write
> > > + its
> > > output
> > > + to standard output. The command for COPY TO, which output
> > goes
> > > + to,
> > > must
> > > + read its input from standard input.
> > > + </para>
> >
> > I've struggled to make the document more meaningful.
>
> To be honest, I am not sure whether introducing pre, post processor
> terminology is right or not,
> But again I shall let committer decide about this point.

Agreed.

> > > 6. Can we have one example of this new syntax, it can make it more
> > > meaningful.
> >
> > Done.
> >
> > Sorry for the long delay.
>
> All the reported issues are handled in the new patch.
>
> I have one small another doubt that in function parse_slash_copy, you
> avoided expand tilde
> for program case, which I am not sure is the right thing or not.

Sorry, I'm not sure that, too. I'd like to leave this for committers.

> I am marking this patch as Ready For Committer.

Thanks!

Best regards,
Etsuro Fujita

> Notes For Committer
> -----------------------
> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> This is currently documented
> 2. Documentation needs to be checked, especially with focus whether
> introducing pre, post processor terminology is
> Okay.
> 3. In function parse_slash_copy, expand tilde is avaoided, is it okay?
>
>
> With Regards,
> Amit Kapila.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'Amit Kapila' <amit(dot)kapila(at)huawei(dot)com>, 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-26 19:11:25
Message-ID: 512D08DD.8050109@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.02.2013 12:43, Etsuro Fujita wrote:
>>>>>> 1. "Broken pipe" is not handled in case of psql "\copy" command;
>>>>>> Issue are as follows:
>>>>>> Following are verified on SuSE-Linux 10.2.
>>>>>> 1) psql is exiting when "\COPY xxx TO" command is issued
>>>>>> and
>>>> command/script is not found
>>>>>> When popen is called in write mode it is creating
>>>>>> valid
>>>> file descriptor and when it tries to write to file "Broken pipe"
>>> error
>>>> is> coming which is not handled.
>>>>>> psql# \copy pgbench_accounts TO PROGRAM
>>>> '../compress.sh pgbench_accounts4.txt'
>>>>>> 2) When "\copy" command is in progress then
>>> program/command
>>>>>> is
>>>> killed/"crashed due to any problem"
>>>>>> psql is exiting.
>>>>
>>>>> This is a headache. I have no idea how to solve this.
>>>>
>>>> I think we can keep it for committer to take a call on this issue.
>>>
>>> Agreed.

Fortunately this one is easy. We just need to ignore SIGPIPE, by calling
pqsignal(SIGPIPE, SIG_IGN) before popen(). We already do the same in
psql when the query output is piped to a pager, in PageOutput.

>>>> 5. Following in copy.sgml can be changed to make more meaningful as
>>>> the first line looks little adhoc.
>>>> +<para>
>>>> + The command that input comes from or that output goes to.
>>>> + The command for COPY FROM, which input comes from, must write
>>>> + its
>>>> output
>>>> + to standard output. The command for COPY TO, which output
>>> goes
>>>> + to,
>>>> must
>>>> + read its input from standard input.
>>>> +</para>
>>>
>>> I've struggled to make the document more meaningful.
>>
>> To be honest, I am not sure whether introducing pre, post processor
>> terminology is right or not,
>> But again I shall let committer decide about this point.
>
> Agreed.

I changed the terminology to use terms like "the command specified with
PROGRAM", instead of talking about pre- and post-processsors.

>> I have one small another doubt that in function parse_slash_copy, you
>> avoided expand tilde
>> for program case, which I am not sure is the right thing or not.
>
> Sorry, I'm not sure that, too. I'd like to leave this for committers.

That seems correct. The shell will do tilde expansion with popen(), we
don't need to do it ourselves.

There's one oddity in the psql \copy syntax. This is actually present in
earlier versions too, but I think we should fix it now that we extend
the syntax:

-- Writes to standard output. There's no way to write to a file called
"stdout".
\copy foo to 'stdout'

I think we should change the interpretation of the above so that it
writes to a file called "stdout". It's possible that there's an
application out there that relies on that to write to stdout, but it's
not hard to fix the application. A small note in the release notes might
be in order.

Also, I think we should require the command to be quoted in \copy. Ie.
let's forbid this:

\copy pgbench_accounts to program /bin/cat>/dev/null

and require that it's written as:

\copy pgbench_accounts to program '/bin/cat>/dev/null'

We don't require quoting for filenames, which I find a bit weird too,
but it seems particularly confusing for a shell command.

Attached is a new version of this patch that I almost committed, but one
thing caught my eye at the last minute: The error reporting is not very
user-friendly. If the program fails after reading/writing some rows, the
reason is printed to the log, but not the user:

postgres=# copy foo to program '/tmp/midway-crash';
ERROR: could not execute command "/tmp/midway-crash"

the log has more details:

LOG: child process exited with exit code 10
STATEMENT: copy foo to program '/tmp/midway-crash';
ERROR: could not execute command "/tmp/midway-crash"
STATEMENT: copy foo to program '/tmp/midway-crash';

I think we should arrange for the "child process exited with exit code
10" to be printed as errdetail to the client. Similarly, with psql \copy
command, the "child process exited with exit code 10" command shouldn't
be printed straight to stderr, but should go through psql_error.

I'll try to refactor pclose_check tomorrow to do that. Meanwhile, please
take a look at the attached if you have the time. I rewrote much of the
docs changes, and some comments.

- Heikki

Attachment Content-Type Size
copy-popen-heikki-1.patch text/x-diff 34.9 KB

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-27 06:32:26
Message-ID: 006101ce14b4$36525fc0$a2f71f40$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, February 27, 2013 12:41 AM Heikki Linnakangas wrote:
> On 22.02.2013 12:43, Etsuro Fujita wrote:
> >>>>>> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> >>>>>> Issue are as follows:
> >>>>>> Following are verified on SuSE-Linux 10.2.
> >>>>>> 1) psql is exiting when "\COPY xxx TO" command is
> issued
> >>>>>> and
> >>>> command/script is not found
> >>>>>> When popen is called in write mode it is
> >>>>>> creating valid
> >>>> file descriptor and when it tries to write to file "Broken pipe"
> >>> error
> >>>> is> coming which is not handled.
> >>>>>> psql# \copy pgbench_accounts TO PROGRAM
> >>>> '../compress.sh pgbench_accounts4.txt'
> >>>>>> 2) When "\copy" command is in progress then
> >>> program/command
> >>>>>> is
> >>>> killed/"crashed due to any problem"
> >>>>>> psql is exiting.
> >>>>
> >>>>> This is a headache. I have no idea how to solve this.
> >>>>
> >>>> I think we can keep it for committer to take a call on this issue.
> >>>
> >>> Agreed.
>
> Fortunately this one is easy. We just need to ignore SIGPIPE, by
> calling pqsignal(SIGPIPE, SIG_IGN) before popen(). We already do the
> same in psql when the query output is piped to a pager, in PageOutput.

So are you planning to call the same in do_copy as well; in current patch
it is not there.

> >>>> 5. Following in copy.sgml can be changed to make more meaningful
> as
> >>>> the first line looks little adhoc.
> >>>> +<para>
> >>>> + The command that input comes from or that output goes to.
> >>>> + The command for COPY FROM, which input comes from, must
> >>>> +write its
> >>>> output
> >>>> + to standard output. The command for COPY TO, which output
> >>> goes
> >>>> + to,
> >>>> must
> >>>> + read its input from standard input.
> >>>> +</para>
> >>>
> >>> I've struggled to make the document more meaningful.
> >>
> >> To be honest, I am not sure whether introducing pre, post processor
> >> terminology is right or not, But again I shall let committer decide
> >> about this point.
> >
> > Agreed.
>
> I changed the terminology to use terms like "the command specified with
> PROGRAM", instead of talking about pre- and post-processsors.
>
> >> I have one small another doubt that in function parse_slash_copy,
> you
> >> avoided expand tilde for program case, which I am not sure is the
> >> right thing or not.
> >
> > Sorry, I'm not sure that, too. I'd like to leave this for
> committers.
>
> That seems correct. The shell will do tilde expansion with popen(), we
> don't need to do it ourselves.
>
> There's one oddity in the psql \copy syntax. This is actually present
> in earlier versions too, but I think we should fix it now that we
> extend the syntax:
>
> -- Writes to standard output. There's no way to write to a file
> called "stdout".
> \copy foo to 'stdout'
>
> I think we should change the interpretation of the above so that it
> writes to a file called "stdout". It's possible that there's an
> application out there that relies on that to write to stdout, but it's
> not hard to fix the application. A small note in the release notes
> might be in order.
>
> Also, I think we should require the command to be quoted in \copy. Ie.
> let's forbid this:
>
> \copy pgbench_accounts to program /bin/cat>/dev/null
>
> and require that it's written as:
>
> \copy pgbench_accounts to program '/bin/cat>/dev/null'
>
> We don't require quoting for filenames, which I find a bit weird too,
> but it seems particularly confusing for a shell command.
>
> Attached is a new version of this patch that I almost committed, but
> one thing caught my eye at the last minute: The error reporting is not
> very user-friendly. If the program fails after reading/writing some
> rows, the reason is printed to the log, but not the user:
>
> postgres=# copy foo to program '/tmp/midway-crash';
> ERROR: could not execute command "/tmp/midway-crash"
>
> the log has more details:
>
> LOG: child process exited with exit code 10
> STATEMENT: copy foo to program '/tmp/midway-crash';
> ERROR: could not execute command "/tmp/midway-crash"
> STATEMENT: copy foo to program '/tmp/midway-crash';
>
> I think we should arrange for the "child process exited with exit code
> 10" to be printed as errdetail to the client. Similarly, with psql
> \copy command, the "child process exited with exit code 10" command
> shouldn't be printed straight to stderr, but should go through
> psql_error.
>
> I'll try to refactor pclose_check tomorrow to do that. Meanwhile,
> please take a look at the attached if you have the time. I rewrote much
> of the docs changes, and some comments.

If there is semicolon at end, it takes it into account for creating
filename.
\copy foo to c:\data\foo.out;
This problem was fixed in previous patch, but I think handling of quotes has
changed this behavior.

With Regards,
Amit Kapila.


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-27 09:38:54
Message-ID: 005b01ce14ce$41ffe170$c5ffa450$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Thank you for the work!

> From: Amit Kapila [mailto:amit(dot)kapila(at)huawei(dot)com]

> On Wednesday, February 27, 2013 12:41 AM Heikki Linnakangas wrote:

> > There's one oddity in the psql \copy syntax. This is actually present
> > in earlier versions too, but I think we should fix it now that we
> > extend the syntax:
> >
> > -- Writes to standard output. There's no way to write to a file
> > called "stdout".
> > \copy foo to 'stdout'
> >
> > I think we should change the interpretation of the above so that it
> > writes to a file called "stdout". It's possible that there's an
> > application out there that relies on that to write to stdout, but it's
> > not hard to fix the application. A small note in the release notes
> > might be in order.
> >
> > Also, I think we should require the command to be quoted in \copy. Ie.
> > let's forbid this:
> >
> > \copy pgbench_accounts to program /bin/cat>/dev/null
> >
> > and require that it's written as:
> >
> > \copy pgbench_accounts to program '/bin/cat>/dev/null'
> >
> > We don't require quoting for filenames, which I find a bit weird too,
> > but it seems particularly confusing for a shell command.

Agreed. ISTM that the comment on parse_slash_copy() needs to be changed:

* table name can be double-quoted and can have a schema part.
* column names can be double-quoted.
* filename can be single-quoted like SQL literals.
* command can be single-quoted like SQL literals.

The last line must be:

* command must be single-quoted like SQL literals.

> > Attached is a new version of this patch that I almost committed, but
> > one thing caught my eye at the last minute: The error reporting is not
> > very user-friendly. If the program fails after reading/writing some
> > rows, the reason is printed to the log, but not the user:
> >
> > postgres=# copy foo to program '/tmp/midway-crash';
> > ERROR: could not execute command "/tmp/midway-crash"
> >
> > the log has more details:
> >
> > LOG: child process exited with exit code 10
> > STATEMENT: copy foo to program '/tmp/midway-crash';
> > ERROR: could not execute command "/tmp/midway-crash"
> > STATEMENT: copy foo to program '/tmp/midway-crash';
> >
> > I think we should arrange for the "child process exited with exit code
> > 10" to be printed as errdetail to the client. Similarly, with psql
> > \copy command, the "child process exited with exit code 10" command
> > shouldn't be printed straight to stderr, but should go through
> > psql_error.
> >
> > I'll try to refactor pclose_check tomorrow to do that. Meanwhile,
> > please take a look at the attached if you have the time. I rewrote much
> > of the docs changes, and some comments.

Looks fine to me.

> If there is semicolon at end, it takes it into account for creating
> filename.
> \copy foo to c:\data\foo.out;
> This problem was fixed in previous patch, but I think handling of quotes has
> changed this behavior.

ISTM that the following part at parse_slash_copy() needs to be changed:

/* { 'filename' | PROGRAM 'command' | STDIN | STDOUT | PSTDIN | PSTDOUT } */
token = strtokx(NULL, whitespace, NULL, "'",
0, false, false, pset.encoding);
if (!token)
goto error;

strtokx() in the above should be called in the following way:

token = strtokx(NULL, whitespace, ";", "'",
0, false, false, pset.encoding);

Thanks,

Best regards,
Etsuro Fujita


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'Etsuro Fujita' <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-27 13:00:04
Message-ID: 512E0354.4050206@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27.02.2013 08:32, Amit Kapila wrote:
> If there is semicolon at end, it takes it into account for creating
> filename.
> \copy foo to c:\data\foo.out;
> This problem was fixed in previous patch, but I think handling of quotes has
> changed this behavior.

This is existing behavior, that creates a file called "foo.out;" on
previous psql versions as well. I agree it's almost certainly not what
the user intended, but that's a separate patch.

- Heikki


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'Amit Kapila' <amit(dot)kapila(at)huawei(dot)com>, 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-27 16:22:53
Message-ID: 512E32DD.6010909@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27.02.2013 11:38, Etsuro Fujita wrote:
> Agreed. ISTM that the comment on parse_slash_copy() needs to be changed:
>
> * table name can be double-quoted and can have a schema part.
> * column names can be double-quoted.
> * filename can be single-quoted like SQL literals.
> * command can be single-quoted like SQL literals.
>
> The last line must be:
>
> * command must be single-quoted like SQL literals.

Fixed that.

>>> Attached is a new version of this patch that I almost committed, but
>>> one thing caught my eye at the last minute: The error reporting is not
>>> very user-friendly. If the program fails after reading/writing some
>>> rows, the reason is printed to the log, but not the user:
>>>
>>> postgres=# copy foo to program '/tmp/midway-crash';
>>> ERROR: could not execute command "/tmp/midway-crash"
>>>
>>> the log has more details:
>>>
>>> LOG: child process exited with exit code 10
>>> STATEMENT: copy foo to program '/tmp/midway-crash';
>>> ERROR: could not execute command "/tmp/midway-crash"
>>> STATEMENT: copy foo to program '/tmp/midway-crash';
>>>
>>> I think we should arrange for the "child process exited with exit code
>>> 10" to be printed as errdetail to the client. Similarly, with psql
>>> \copy command, the "child process exited with exit code 10" command
>>> shouldn't be printed straight to stderr, but should go through
>>> psql_error.
>>>
>>> I'll try to refactor pclose_check tomorrow to do that. Meanwhile,
>>> please take a look at the attached if you have the time. I rewrote much
>>> of the docs changes, and some comments.
>
> Looks fine to me.

Ok, committed with the changes to the error handling. I refactored the
logic to construct a human-readable string from pclose_check() to a new
function, and used that.

Thanks for the patch, and thanks Amit for the review!

- Heikki