Re: pg_dump/restore encoding woes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump/restore encoding woes
Date: 2013-08-26 15:59:02
Message-ID: 66480.1377532742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> When client encoding is not specified explicitly with the -E option, or
> PGCLIENTENCODING env variable, the dump is created in the server encoding.

Yeah, that's intentional as I recall.

> However, pg_dump is special, because client encoding affects not only
> the encoding used to speak to the server, but it also determines how the
> resulting dump is encoded. If you have a UTF-8 server, and a LATIN1
> console, there is no way to get a UTF-8 encoded dump of a single table
> which has non-ASCII characters in its name. There is a good reason to
> want to dump in the server encoding regardless of the encoding of the
> client: that avoids the costly encoding conversion during the dump, and
> very likely another conversion back on restore. (as a convenience, it
> would be nice if you could specify "-E server" to mean "same as server
> encoding")

There's a considerably more compelling reason than speed to default to
avoiding a conversion: doing a conversion carries significant risk of
outright failure, due to not being able to convert some data character
to the client character set.

> The pg_dump -E option just sets client_encoding, but I think it would be
> better for -E to only set the encoding used in the dump, and
> PGCLIENTENCODING env variable (if set) was used to determine the
> encoding of the command-line arguments. Opinions?

I think this is going to be a lot easier said than done, but feel free
to see if you can make it work. (As you point out, we don't have
any client-side encoding conversion infrastructure, but I don't see
how you're going to make this work without it.)

A second issue is whether we should divorce -E and PGCLIENTENCODING like
that, when they have always meant the same thing. You mentioned the
alternative of looking at pg_dump's locale environment to determine the
command line encoding --- would that be better?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-08-26 16:45:20 Re: median and percentile function implementation
Previous Message Heikki Linnakangas 2013-08-26 15:26:52 pg_dump/restore encoding woes