Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Craig Ringer'" <ringerc(at)ringerc(dot)id(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Date: 2012-11-14 11:30:49
Message-ID: 001d01cdc25b$7eee6ea0$7ccb4be0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> > I wrote:
> > > "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > >> I have a question. I think it would be also better to extend the syntax
> > >> for the SQL COPY command in the same way, ie,
> > >> COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with
> > >> format 'csv'
> >
> > > Yeah, sure --- that case is already superuser-only, so why not give it
> > > the option of being a popen instead of just fopen.
> >
> > BTW, one thought that comes to mind is that such an operation is
> > extremely likely to fail under environments such as SELinux. That's
> > not necessarily a reason not to do it, but we should be wary of
> > promising that it will work everywhere. Probably a documentation note
> > about this would be enough.
>
> OK I'll revise the patch.

I've revised the patch. In this version a user can specify hooks for pre- and
post-processor executables for COPY and \copy in the follwoing way:

$ echo '/bin/gunzip -c $1' > decompress.sh
$ chmod +x decompress.sh

In the case of the COPY command,

postgres=# COPY foo FROM '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz'
WITH (format 'csv');

Also, in the case of the \copy instruction,

postgres=# \copy foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz'
with (format 'csv')

As shown in the example above, I've assumed that the syntax for this option for
e.g., the COPY command is:

COPY table_name FROM 'progname filename' WITH ...
COPY table_name TO 'progname filename' WITH ...

Here, progname for COPY IN is the user-supplied program that takes filename as
its argument and that writes on standard output. Also, prgoname for COPY OUT is
the user-supplied program that reads standard input and writes to filename taken
as its argument. This makes simple the identification and verification of
progname and filename.

Todo:
* Documentation including documentation note about the limitation for
environments such as SELinux mentioned by Tom.
* More test

Any comments and suggestions are welcomed.

Thanks,

Best regards,
Etsuro Fujita

Attachment Content-Type Size
copy-popen-20121114.patch application/octet-stream 14.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-11-14 11:41:20 Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Previous Message Amit kapila 2012-11-14 11:26:26 Re: Performance Improvement by reducing WAL for Update Operation