Re: Add encoding support to COPY

Lists: pgsql-hackers
From: David Blewett <david(at)dawninglight(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Add encoding support to COPY
Date: 2009-07-15 15:51:46
Message-ID: 9d1f8d830907150851u555e8445w4424584a9d14410@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ ENCODING [ AS ] 'charset' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE QUOTE column [, ...] ]

Any objections? It seems like a cleaner solution client side than
issuing multiple calls to set the client_encoding. If there are no
objections, I can attempt to prepare a patch for the next commitfest.

David Blewett


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Blewett <david(at)dawninglight(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 16:04:15
Message-ID: 23838.1247673855@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Blewett <david(at)dawninglight(dot)net> writes:
> Today on IRC, someone was wondering what the preferred method of
> exporting data in a specific encoding via COPY was. They reply was
> wrapping the COPY command in "set client_encoding='foo';", which made
> me wonder how hard it would be to add an additional WITH parameter to
> the actual COPY statement to specify the encoding, a la:

What is the point? You'd generally have client_encoding set correctly
for your usage anyway, and if you did not, the data could confuse your
client-side code terribly. Offering an option to let the backend send
data in the "wrong" encoding does NOT seem like a good idea to me.

regards, tom lane


From: David Blewett <david(at)dawninglight(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 16:08:32
Message-ID: 9d1f8d830907150908je524b9ey9a5cb7b40650025d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What is the point?  You'd generally have client_encoding set correctly
> for your usage anyway, and if you did not, the data could confuse your
> client-side code terribly.  Offering an option to let the backend send
> data in the "wrong" encoding does NOT seem like a good idea to me.

The use case was that the client connection was using one encoding,
but needed to output the file in a different encoding. So they would
have to do the "set client_encoding" dance each time they wanted to
export the file. I don't see how it's "wrong", especially considering
there is already a method to do this, albeit cumbersome. I consider it
simply syntactic sugar over existing functionality.

David Blewett


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Blewett <david(at)dawninglight(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 16:17:29
Message-ID: 24121.1247674649@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Blewett <david(at)dawninglight(dot)net> writes:
> On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What is the point? You'd generally have client_encoding set correctly
>> for your usage anyway, and if you did not, the data could confuse your
>> client-side code terribly. Offering an option to let the backend send
>> data in the "wrong" encoding does NOT seem like a good idea to me.

> The use case was that the client connection was using one encoding,
> but needed to output the file in a different encoding. So they would
> have to do the "set client_encoding" dance each time they wanted to
> export the file.

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination. I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

regards, tom lane


From: Nagy Karoly Gabriel <nagy(dot)karoly(at)expert-erp(dot)net>
To: David Blewett <david(at)dawninglight(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 16:59:56
Message-ID: 4A5E0B0C.4070408@expert-erp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Blewett wrote:
> Today on IRC, someone was wondering what the preferred method of
> exporting data in a specific encoding via COPY was. They reply was
> wrapping the COPY command in "set client_encoding='foo';", which made
> me wonder how hard it would be to add an additional WITH parameter to
> the actual COPY statement to specify the encoding, a la:
> [ [ WITH ]
> [ BINARY ]
> [ OIDS ]
> [ DELIMITER [ AS ] 'delimiter' ]
> [ ENCODING [ AS ] 'charset' ]
> [ NULL [ AS ] 'null string' ]
> [ CSV [ HEADER ]
> [ QUOTE [ AS ] 'quote' ]
> [ ESCAPE [ AS ] 'escape' ]
> [ FORCE QUOTE column [, ...] ]
>
> Any objections? It seems like a cleaner solution client side than
> issuing multiple calls to set the client_encoding. If there are no
> objections, I can attempt to prepare a patch for the next commitfest.
>
> David Blewett
>
I think that I was the one who wondered about that. Our use case is
related to moving data between different servers which have different
encodings. Ofcourse the encoding should be an option only when COPY
involves files.

--
Nagy Karoly Gabriel
Expert Software Group SRL

(o__ 417495 Sanmartin nr. 205
//\' Bihor, Romania
V_/_ Tel./Fax: +4 0259 317 142, +4 0259 317 143

Attachment Content-Type Size
nagy_karoly.vcf text/x-vcard 265 bytes

From: David Blewett <david(at)dawninglight(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 17:53:01
Message-ID: 9d1f8d830907151053r270f6b65v2162d6853b30294@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Apologies to Tom for the duplicate...

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, it might make sense to allow an ENCODING option attached to a COPY
> with a file source/destination.  I remain of the opinion that overriding
> client_encoding on a transfer to/from the client is a bad idea.

I really don't see how it is any different from manually flipping the
client_encoding before/after the transfer. We could of course put a
warning sign in the docs, but it seems to me it's more error prone for
clients to set the client_encoding manually rather than include an
option for a single command. What happens if an exception is thrown
during the COPY process and the client doesn't handle things
correctly? The rest of their session could be in an unexpected
encoding, whereas with this method we know to return to the original
client_encoding before doing anything else. By including the encoding
option, their explicitly saying how they want to handle the data.

I could see a use case for remote client code to do a COPY to STDOUT,
that is actually being redirected to a file. If the consensus is for
local file-based operations only, however, I can structure the patch
that way.

David


From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: nagy(dot)karoly(at)expert-erp(dot)net, David Blewett <david(at)dawninglight(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 18:05:28
Message-ID: DA73C15AB253C0F7F9E6DCC9@amenophis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--On 15. Juli 2009 19:59:56 +0300 Nagy Karoly Gabriel
<nagy(dot)karoly(at)expert-erp(dot)net> wrote:

> I think that I was the one who wondered about that. Our use case is
> related to moving data between different servers which have different
> encodings. Ofcourse the encoding should be an option only when COPY
> involves files.

I find this rather confusing: can't you just tell via client_encoding the
correct encoding your file contains during restore?

--
Thanks

Bernd


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: David Blewett <david(at)dawninglight(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 18:20:16
Message-ID: 20090715182016.GM4551@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Blewett wrote:

> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Well, it might make sense to allow an ENCODING option attached to a COPY
> > with a file source/destination.  I remain of the opinion that overriding
> > client_encoding on a transfer to/from the client is a bad idea.

> I could see a use case for remote client code to do a COPY to STDOUT,
> that is actually being redirected to a file. If the consensus is for
> local file-based operations only, however, I can structure the patch
> that way.

Yeah, the problem is that reading to/from files is only allowed to
superusers ...

(I'm not sure how this affects \copy in psql; probably something you
should investigate)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Blewett <david(at)dawninglight(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 20:07:09
Message-ID: 1774.1247688429@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Blewett <david(at)dawninglight(dot)net> writes:
> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, it might make sense to allow an ENCODING option attached to a COPY
>> with a file source/destination. I remain of the opinion that overriding
>> client_encoding on a transfer to/from the client is a bad idea.

> I really don't see how it is any different from manually flipping the
> client_encoding before/after the transfer.

The difference is that the client-side code gets told that the encoding
changed if you do the latter.

regards, tom lane


From: David Blewett <david(at)dawninglight(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 20:19:55
Message-ID: 9d1f8d830907151319u69e5559du6bfc619455bfb265@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 15, 2009 at 4:07 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Blewett <david(at)dawninglight(dot)net> writes:
>> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, it might make sense to allow an ENCODING option attached to a COPY
>>> with a file source/destination.  I remain of the opinion that overriding
>>> client_encoding on a transfer to/from the client is a bad idea.
>
>> I really don't see how it is any different from manually flipping the
>> client_encoding before/after the transfer.
>
> The difference is that the client-side code gets told that the encoding
> changed if you do the latter.

Do you mean at the protocol level?

All I was planning on having the patch do is the equivalent of the set
client_encoding dance. Wouldn't that be sufficent to notify the client
of the encoding change?

David Blewett


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: nagy(dot)karoly(at)expert-erp(dot)net
Cc: David Blewett <david(at)dawninglight(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add encoding support to COPY
Date: 2009-07-15 20:40:14
Message-ID: 4A5E3EAE.6070402@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nagy Karoly Gabriel wrote:
> David Blewett wrote:
>> Today on IRC, someone was wondering what the preferred method of
>> exporting data in a specific encoding via COPY was. They reply was
>> wrapping the COPY command in "set client_encoding='foo';", which made
>> me wonder how hard it would be to add an additional WITH parameter to
>> the actual COPY statement to specify the encoding, a la:
>>
> I think that I was the one who wondered about that. Our use case is
> related to moving data between different servers which have different
> encodings. Ofcourse the encoding should be an option only when COPY
> involves files.
>

Well, that is the case that there seems to be consensus about, and it's
also the case that can't be done via client encoding. We tend to have a
bias against providing lots of ways to do the same thing, so let's go
with this case (i.e. do it for cases other than STDIN/STDOUT).

cheers

andrew