psql -f inconsistency with "copy from stdin"

Lists: pgsql-hackers
From: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: psql -f inconsistency with "copy from stdin"
Date: 2000-01-11 23:35:24
Message-ID: 200001112335.QAA00351@biology.nmsu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Today I ran into an inconsistency between two versions of postgresql
in how psql handles copies from stdin. At this point I am not sure
how the rewrite of psql does things, but thought I'd mention the
problem in case someone with it installed can check.

The issue is how the command

psql -f test.sql db < test.dat

is treated, given the following files:

-- test.sql
drop table test;
create table test (name text);
copy test from stdin;
select * from test;

and

test.dat
a
b

Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2
producing what I would expect and v6.5.2 producing anomalous output.
Note that performing the copy as

psql -c "copy test from stdin" db < test.dat

works fine in either case.

v6.4.2 output: The contents of test.dat are read into the table as
one might expect having redirected that file to stdin and copying from
stdin.

v6.5.2 output: The contents of test.dat are not read into the table at
all. Instead, the remainder of the test.sql file (i.e., select * ...)
are read into the table.

How does the current version behave when performing these copies? If
it still behaves like 6.5.2, I suspect there is some bug in handling
the copy command.

Cheers,
Brook


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 03:30:43
Message-ID: Pine.LNX.4.21.0001120359290.3735-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

7.0 behaves like 6.5.* in this regard because the code is pretty much the
same. Thanks for pointing this out.

On 2000-01-12, Brook Milligan mentioned:

> Today I ran into an inconsistency between two versions of postgresql
> in how psql handles copies from stdin. At this point I am not sure
> how the rewrite of psql does things, but thought I'd mention the
> problem in case someone with it installed can check.
>
> The issue is how the command
>
> psql -f test.sql db < test.dat
>
> is treated, given the following files:
>
> -- test.sql
> drop table test;
> create table test (name text);
> copy test from stdin;
> select * from test;
>
> and
>
> test.dat
> a
> b
>
> Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2
> producing what I would expect and v6.5.2 producing anomalous output.
> Note that performing the copy as
>
> psql -c "copy test from stdin" db < test.dat
>
> works fine in either case.
>
> v6.4.2 output: The contents of test.dat are read into the table as
> one might expect having redirected that file to stdin and copying from
> stdin.
>
> v6.5.2 output: The contents of test.dat are not read into the table at
> all. Instead, the remainder of the test.sql file (i.e., select * ...)
> are read into the table.
>
> How does the current version behave when performing these copies? If
> it still behaves like 6.5.2, I suspect there is some bug in handling
> the copy command.
>
> Cheers,
> Brook
>
>
> ************
>
>

--
Peter Eisentraut Sernanders väg 10:115
peter_e(at)gmx(dot)net 75262 Uppsala
http://yi.org/peter-e/ Sweden


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 06:42:56
Message-ID: 9069.947659376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> 7.0 behaves like 6.5.* in this regard because the code is pretty much the
> same. Thanks for pointing this out.

Of course, the question is which way is right...

I can see the potential usefulness of doing
psql -f driving.script <data.file
but on the other hand, it bothers me a good deal that a script
containing
COPY table FROM STDIN;
... data here ...
\.
(as generated by such unheard-of, seldom-used utilities as pg_dump)
would work when sourced by psql <pgdump.script and *fail* when sourced
by psql -f pgdump.script. But that's what will happen if we change
it back.

I suspect the change in behavior from 6.4 to 6.5 may have been a
deliberate change to avoid this failure mode. It'd be worth checking
the archives to see if you can find any discussion about it.

It seems to me that we ought to provide both behaviors, but make sure
that the one that supports data-in-the-script is the one invoked by
COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy
command can be set up to support the other alternative.

regards, tom lane


From: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: peter_e(at)gmx(dot)net, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 15:19:23
Message-ID: 200001121519.IAA03451@biology.nmsu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Of course, the question is which way is right...

I can see the potential usefulness of doing
psql -f driving.script <data.file
but on the other hand, it bothers me a good deal that a script
containing
COPY table FROM STDIN;
... data here ...
\.
(as generated by such unheard-of, seldom-used utilities as pg_dump)
would work when sourced by psql <pgdump.script and *fail* when sourced
by psql -f pgdump.script. But that's what will happen if we change
it back.

I suspect the change in behavior from 6.4 to 6.5 may have been a
deliberate change to avoid this failure mode. It'd be worth checking
the archives to see if you can find any discussion about it.

It seems to me that we ought to provide both behaviors, but make sure
that the one that supports data-in-the-script is the one invoked by
COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy
command can be set up to support the other alternative.

But isn't there a greater difference between copy and \copy than this?
Doesn't one act on the frontend and one on the backend? There needs
to be a mechanism for copying data in through the front end without
special permissions.

Also, it seems unfortunate from a semantics point of view to have COPY
FROM STDIN not actually refer to the stdin file of the process.
Perhaps that is necessary to preserve compatability with old pg_dump
(new versions could be changed in this regard of course), but it is
not what I would naturally expect STDIN to mean in the context of 30
years of Unix development. Further, this use of STDIN clearly
conflicts with the meaning of STDOUT in the analogous copy out command
which doesn't insert the output into the script file but rather
directs it to the stdout file.

In order to maintain some compatability with these broader uses of the
terms STDIN/STDOUT (while still supporting previous pg_dump scripts,
at least for awhile), I think it is worth exploring some options. A
few ideas are:

- Introduce a new syntax for the 6.5.2 here-doc semantics.
Possibilities might include COPY FROM HERE (copy ends at EOF or \.)
or COPY UNTIL <tag> (copy ends at matching <tag>, like shell
here-docs). pg_dump would have to be changed to correspond.

- Introduce a new flag to psql to differentiate the interpretation of
COPY FROM STDIN. This seems confusing to users, but might be
worthwhile (but become deprecated after a few releases) if the
syntax is changed and old pg_dump scripts need supporting. New
scripts and new pg_dump needn't worry about this if they use the new
syntax.

Cheers,
Brook


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
Cc: peter_e(at)gmx(dot)net, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 15:49:34
Message-ID: 9817.947692174@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Brook Milligan <brook(at)biology(dot)nmsu(dot)edu> writes:
> It seems to me that we ought to provide both behaviors, but make sure
> that the one that supports data-in-the-script is the one invoked by
> COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy
> command can be set up to support the other alternative.

> But isn't there a greater difference between copy and \copy than this?
> Doesn't one act on the frontend and one on the backend?

Not when it's COPY FROM STDIN or TO STDOUT --- from the backend's point of
view, that means transfer data from or to the frontend. What psql
does with it is psql's concern.

(Actually, \copy is implemented by sending a COPY FROM STDIN/TO STDOUT
command to the backend; the backend can't tell the difference between
the two cases, and has no way to know where the data is really coming
from or going to on the client side.)

> - Introduce a new syntax for the 6.5.2 here-doc semantics.
> Possibilities might include COPY FROM HERE (copy ends at EOF or \.)

Changing the SQL command is the wrong thing to think about, because
the parameter would only be known at the backend which is not where
it needs to be known to change psql's behavior. Furthermore, from
the backend's point of view it *is* sending to or from the only
"user interface" it's got. So I don't think there's anything wrong
with the definition of the SQL COPY command. You should be thinking
about adding options to psql's \copy, instead, if you want more
flexibility in controlling where psql gets or puts data.

> pg_dump would have to be changed to correspond.

IMHO any proposal that requires changing pg_dump is a non-starter,
because it will fail when people try to load 6.5 or earlier dumps
into 7.0. But fortunately, pg_dump doesn't use \copy ...

regards, tom lane


From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>, peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 16:19:59
Message-ID: Pine.BSF.4.21.0001121218080.46499-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 12 Jan 2000, Tom Lane wrote:

> > pg_dump would have to be changed to correspond.
>
> IMHO any proposal that requires changing pg_dump is a non-starter,
> because it will fail when people try to load 6.5 or earlier dumps
> into 7.0. But fortunately, pg_dump doesn't use \copy ...

I'm confused here...why would "any proposal that requires changing pg_dump
is a non-starter"? How does changing pg_dump in v7.0 affect pg_dump in
v6.5?

As long as I can reload my v6.5 data into a v7.0 database using the
pg_dump from v6.5, confused as to why v7.0s pg_dump matters...

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 19:38:29
Message-ID: Pine.LNX.4.21.0001121812260.1646-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

After further contemplation I am not completely sure which way is the
correct behaviour. Consider me doing this:

--test.sql
COPY foo FROM stdin;
data data
data data
SELECT * FROM foo;

and running psql -f test.sql < (anything) on it. Then I would expect it
to behave the other way.

The -f option is just another way of saying "get the input from there". If
you use both -f and stdin you're in essence saying "get the input from
there and there", and that feature does not exist in psql and would be
hard to extend to the general case.

On 2000-01-12, Brook Milligan mentioned:

> Today I ran into an inconsistency between two versions of postgresql
> in how psql handles copies from stdin. At this point I am not sure
> how the rewrite of psql does things, but thought I'd mention the
> problem in case someone with it installed can check.
>
> The issue is how the command
>
> psql -f test.sql db < test.dat
>
> is treated, given the following files:
>
> -- test.sql
> drop table test;
> create table test (name text);
> copy test from stdin;
> select * from test;
>
> and
>
> test.dat
> a
> b
>
> Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2
> producing what I would expect and v6.5.2 producing anomalous output.
> Note that performing the copy as
>
> psql -c "copy test from stdin" db < test.dat
>
> works fine in either case.
>
> v6.4.2 output: The contents of test.dat are read into the table as
> one might expect having redirected that file to stdin and copying from
> stdin.
>
> v6.5.2 output: The contents of test.dat are not read into the table at
> all. Instead, the remainder of the test.sql file (i.e., select * ...)
> are read into the table.
>
> How does the current version behave when performing these copies? If
> it still behaves like 6.5.2, I suspect there is some bug in handling
> the copy command.
>
> Cheers,
> Brook
>
>
> ************
>
>

--
Peter Eisentraut Sernanders väg 10:115
peter_e(at)gmx(dot)net 75262 Uppsala
http://yi.org/peter-e/ Sweden


From: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
To: peter_e(at)gmx(dot)net
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 20:54:20
Message-ID: 200001122054.NAA04232@biology.nmsu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The -f option is just another way of saying "get the input from there". If
you use both -f and stdin you're in essence saying "get the input from
there and there", and that feature does not exist in psql and would be
hard to extend to the general case.

But there are specifically two kinds of input involved here [*]:

- input of SQL commands and such to psql
- input of data to a COPY command

To me these are conceptually very distinct (in much the same way you
have distinguished already between various output streams; in fact,
I'm not sure how you have matched those with the output stream from
COPY, but it might be relevant to think about that in light of this
discussion). Thus, to me it makes sense to say "take input from there
and there," as long as it is clear that one "there" refers to one
input stream and the other to the other one. For example, -f
naturally refers to the first one above, while the STDIN naturally
refers to the second.

Saying that -f should override all other sources of input is
inconsistent in its own way; after all, that doesn't override a COPY
FROM "filename" command, does it? In that case, you maintain a
distinction between two different input streams. It seems that
dropping that distinction for the special case of "filename" == STDIN
is introducing unnecessary confusion into the semantics of commands.

In short, I'm not really convinced that it is unreasonable to expect a
command like COPY (or \copy) to be able to associate itself with an
input (or output) stream that is different from that implied by -f,
given that the nature of the various I/O streams is so different and
clearly defined.

Cheers,
Brook

[*] I'm not sure what you mean by the "general case," but I can't
think of any other commands, at least SQL commands, that are naturally
associated with more than one input stream, namely the source of the
command itself which may include embedded data. Unless I'm missing
something here, I suspect the "general case" is just fine and doesn't
interact with the problem I raised. What is problematical is the
special case of a command (perhaps there are others?) that inherently
involves more than one input stream: the source of the command itself
and the source of data upon which the command operates.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-12 23:29:31
Message-ID: Pine.LNX.4.21.0001122057120.1646-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2000-01-12, Tom Lane mentioned:

> It seems to me that we ought to provide both behaviors, but make sure
> that the one that supports data-in-the-script is the one invoked by
> COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy
> command can be set up to support the other alternative.

\copy from stdin is not used yet. That would work. There might be issues
I'm overlooking now, but anything else that came up in this thread will
most likely not work in the general case.

--
Peter Eisentraut Sernanders väg 10:115
peter_e(at)gmx(dot)net 75262 Uppsala
http://yi.org/peter-e/ Sweden


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>, peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-13 00:20:04
Message-ID: 11019.947722804@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> On Wed, 12 Jan 2000, Tom Lane wrote:
>> IMHO any proposal that requires changing pg_dump is a non-starter,
>> because it will fail when people try to load 6.5 or earlier dumps
>> into 7.0. But fortunately, pg_dump doesn't use \copy ...

> I'm confused here...why would "any proposal that requires changing pg_dump
> is a non-starter"? How does changing pg_dump in v7.0 affect pg_dump in
> v6.5?

Because people will be using 6.5 pg_dump to make dump scripts that they
will then try to load into 7.0 with 7.0's psql. If we change the way
that COPY FROM STDIN is interpreted, we risk trouble with those scripts.

I like Peter's suggestion of defining "\copy from stdin" to mean
"read from psql's stdin". That would leave the SQL command COPY FROM
STDIN for the other case where the data is in-line in the script.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] psql -f inconsistency with "copy from stdin"
Date: 2000-01-14 22:25:44
Message-ID: Pine.LNX.4.21.0001140134370.1917-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Okay, this is the new law:

copy x from stdin;

"stdin" is whereever the actual copy from stdin command line came from.
This is that way because "stdin" in that case does not really refer to
stdin in the classical sense but tells the backend to get the data from
the same stream the command came from (namely the network connection), and
that's what we're doing.

copy x from stdout;

The output goes to whereever select * from x would go to, in particular \o
affects this. This is purely because I said so, but I think it's
reasonable.

\copy x from stdin

The input comes from psql's stdin. (Which is correcter in this case since
it's a _frontend_ copy.)

\copy x to stdout

psql's stdout

I hope everyone's happy now. ;)

On 2000-01-12, Brook Milligan mentioned:

> But there are specifically two kinds of input involved here [*]:
>
> - input of SQL commands and such to psql
> - input of data to a COPY command
>
> To me these are conceptually very distinct (in much the same way you
> have distinguished already between various output streams; in fact,
> I'm not sure how you have matched those with the output stream from
> COPY, but it might be relevant to think about that in light of this
> discussion). Thus, to me it makes sense to say "take input from there
> and there," as long as it is clear that one "there" refers to one
> input stream and the other to the other one. For example, -f
> naturally refers to the first one above, while the STDIN naturally
> refers to the second.

--
Peter Eisentraut Sernanders väg 10:115
peter_e(at)gmx(dot)net 75262 Uppsala
http://yi.org/peter-e/ Sweden