encoding confusion with \copy command

From: Martin Waite <waite(dot)134(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: encoding confusion with \copy command
Date: 2014-09-17 10:03:42
Message-ID: CAOWKicvhP+qw41OWTk=eZXEssOsMrsaRQRcXbCOPmEsbbCwZFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have a postgresql 7.4 server and client on Centos 6.4. The database
server is using UTF-8 encoding.

I have been exploring the use of the \copy command for importing CSV data
generated by SQL Server 2008. SQL Server 2008 export tool does not escape
quotes that are in the content of fields, and so it is useful to be able to
specify obscure characters in the quote option in the \copy command to work
around this issue.

When I run the following commands in psql, I am surprised that QUOTE is
limited to characters in the range 0x01 - 0x7f, and that UTF8 is mentioned
in the error message if characters outside the range are chosen:

\encoding WIN1252
\copy yuml from '/tmp/yuml.csv' WITH CSV HEADER ENCODING 'WIN1252' QUOTE
as E'\xff';
ERROR: invalid byte sequence for encoding "UTF8": 0xff

I thought that if the client (psql) is WIN1252, and the CSV file is
specified as WIN1252, then I could specify any valid WIN1252 character as
the quote character. Instead, I am limited to the range of characters
that can be encoded as a single byte in UTF-8. Actually, 0x00 is not
accepted either, so the range is 0x01 - 0x7F.

Is this a bug or expected behaviour ?

Is it the case that the server does the actual CSV parsing, and that given
that my server is in UTF8, I am therefore limited to single-byte UTF8
characters ?

regards,
Martin

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dev Kumkar 2014-09-17 12:16:05 pg_multixact issues
Previous Message Dev Kumkar 2014-09-17 06:43:25 Re: Regarding timezone