Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date: 2013-02-27 06:32:26
Message-ID: 006101ce14b4$36525fc0$a2f71f40$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, February 27, 2013 12:41 AM Heikki Linnakangas wrote:
> On 22.02.2013 12:43, Etsuro Fujita wrote:
> >>>>>> 1. "Broken pipe" is not handled in case of psql "\copy" command;
> >>>>>> Issue are as follows:
> >>>>>> Following are verified on SuSE-Linux 10.2.
> >>>>>> 1) psql is exiting when "\COPY xxx TO" command is
> issued
> >>>>>> and
> >>>> command/script is not found
> >>>>>> When popen is called in write mode it is
> >>>>>> creating valid
> >>>> file descriptor and when it tries to write to file "Broken pipe"
> >>> error
> >>>> is> coming which is not handled.
> >>>>>> psql# \copy pgbench_accounts TO PROGRAM
> >>>> '../compress.sh pgbench_accounts4.txt'
> >>>>>> 2) When "\copy" command is in progress then
> >>> program/command
> >>>>>> is
> >>>> killed/"crashed due to any problem"
> >>>>>> psql is exiting.
> >>>>
> >>>>> This is a headache. I have no idea how to solve this.
> >>>>
> >>>> I think we can keep it for committer to take a call on this issue.
> >>>
> >>> Agreed.
>
> Fortunately this one is easy. We just need to ignore SIGPIPE, by
> calling pqsignal(SIGPIPE, SIG_IGN) before popen(). We already do the
> same in psql when the query output is piped to a pager, in PageOutput.

So are you planning to call the same in do_copy as well; in current patch
it is not there.

> >>>> 5. Following in copy.sgml can be changed to make more meaningful
> as
> >>>> the first line looks little adhoc.
> >>>> +<para>
> >>>> + The command that input comes from or that output goes to.
> >>>> + The command for COPY FROM, which input comes from, must
> >>>> +write its
> >>>> output
> >>>> + to standard output. The command for COPY TO, which output
> >>> goes
> >>>> + to,
> >>>> must
> >>>> + read its input from standard input.
> >>>> +</para>
> >>>
> >>> I've struggled to make the document more meaningful.
> >>
> >> To be honest, I am not sure whether introducing pre, post processor
> >> terminology is right or not, But again I shall let committer decide
> >> about this point.
> >
> > Agreed.
>
> I changed the terminology to use terms like "the command specified with
> PROGRAM", instead of talking about pre- and post-processsors.
>
> >> I have one small another doubt that in function parse_slash_copy,
> you
> >> avoided expand tilde for program case, which I am not sure is the
> >> right thing or not.
> >
> > Sorry, I'm not sure that, too. I'd like to leave this for
> committers.
>
> That seems correct. The shell will do tilde expansion with popen(), we
> don't need to do it ourselves.
>
> There's one oddity in the psql \copy syntax. This is actually present
> in earlier versions too, but I think we should fix it now that we
> extend the syntax:
>
> -- Writes to standard output. There's no way to write to a file
> called "stdout".
> \copy foo to 'stdout'
>
> I think we should change the interpretation of the above so that it
> writes to a file called "stdout". It's possible that there's an
> application out there that relies on that to write to stdout, but it's
> not hard to fix the application. A small note in the release notes
> might be in order.
>
> Also, I think we should require the command to be quoted in \copy. Ie.
> let's forbid this:
>
> \copy pgbench_accounts to program /bin/cat>/dev/null
>
> and require that it's written as:
>
> \copy pgbench_accounts to program '/bin/cat>/dev/null'
>
> We don't require quoting for filenames, which I find a bit weird too,
> but it seems particularly confusing for a shell command.
>
> Attached is a new version of this patch that I almost committed, but
> one thing caught my eye at the last minute: The error reporting is not
> very user-friendly. If the program fails after reading/writing some
> rows, the reason is printed to the log, but not the user:
>
> postgres=# copy foo to program '/tmp/midway-crash';
> ERROR: could not execute command "/tmp/midway-crash"
>
> the log has more details:
>
> LOG: child process exited with exit code 10
> STATEMENT: copy foo to program '/tmp/midway-crash';
> ERROR: could not execute command "/tmp/midway-crash"
> STATEMENT: copy foo to program '/tmp/midway-crash';
>
> I think we should arrange for the "child process exited with exit code
> 10" to be printed as errdetail to the client. Similarly, with psql
> \copy command, the "child process exited with exit code 10" command
> shouldn't be printed straight to stderr, but should go through
> psql_error.
>
> I'll try to refactor pclose_check tomorrow to do that. Meanwhile,
> please take a look at the attached if you have the time. I rewrote much
> of the docs changes, and some comments.

If there is semicolon at end, it takes it into account for creating
filename.
\copy foo to c:\data\foo.out;
This problem was fixed in previous patch, but I think handling of quotes has
changed this behavior.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Miroslav Šimulčík 2013-02-27 09:20:47 Check if trigger was fired deferred
Previous Message Pavel Stehule 2013-02-27 05:56:56 Re: bugfix: --echo-hidden is not supported by \sf statements